Spaces:

DontPlanToEnd
/

UGI-Leaderboard

Running

App Files Files Community

368

what is planned to be added?

#366

by drmcbride - opened 11 days ago

Discussion

drmcbride

11 days ago

title

DontPlanToEnd

Owner 11 days ago

A writing quality benchmark accompanied with information like if models take stories/rp in dark vs lighthearted directions and nsfw vs sfw directions, better reasoning model support, a reasoning focused benchmark, and maybe if I can figure it out a submissions section for the leaderboard where people can upvote the models that they think should have the highest priority for testing. And just various things like being able to see which models are good at standard/textbook questions vs pop culture questions.

unrulypony

9 days ago

Thank you, writing benchmark would definitely be a welcome addition since being uncensored as a chat model doesn't necessarily translate to pleasant-to-read roleplay or prose. The popular-ish EQ-Bench has done okay with benching this in the past but sticks to the major base models and proprietary releases.

If possible, could you add a measure of repetition to the writing test? Something like "amount of repeated sentences/trigrams". It's something that smaller models seem to struggle with and is a very noticeable problem when roleplaying.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment