what is planned to be added?

#366
by drmcbride - opened

A writing quality benchmark accompanied with information like if models take stories/rp in dark vs lighthearted directions and nsfw vs sfw directions, better reasoning model support, a reasoning focused benchmark, and maybe if I can figure it out a submissions section for the leaderboard where people can upvote the models that they think should have the highest priority for testing. And just various things like being able to see which models are good at standard/textbook questions vs pop culture questions.

Thank you, writing benchmark would definitely be a welcome addition since being uncensored as a chat model doesn't necessarily translate to pleasant-to-read roleplay or prose. The popular-ish EQ-Bench has done okay with benching this in the past but sticks to the major base models and proprietary releases.

If possible, could you add a measure of repetition to the writing test? Something like "amount of repeated sentences/trigrams". It's something that smaller models seem to struggle with and is a very noticeable problem when roleplaying.

Sign up or log in to comment