-
MMLU-Pro Leaderboard
๐ฅ237More advanced and challenging multi-task evaluation
-
Stick To Your Role! Leaderboard
๐ญ58Benchmarking LLMs on the stability of simulated populations
-
ZeroEval Leaderboard
๐53Embed ZeroEval for evaluation
-
Decentralized Arena Leaderboard
๐ฅ26View and compare LLM evaluations across various domains
Hristo Panev
hppdqdq
AI & ML interests
None yet
Recent Activity
liked
a model
7 days ago
PantheonUnbound/Satyr-V0.1-4B
liked
a model
11 days ago
JunhaoZhuang/FlashVSR-v1.1
liked
a model
26 days ago
lightx2v/Autoencoders
Organizations
None yet