llm_contamination_detector / data /code_eval_board.csv
Yeyito's picture
current evals with num_z=50
56589b2
raw
history blame
345 Bytes
T,Models,ARC,HellaSwag,MMLU,TruthfulQA,Winogrande,GSM8K
🟒,roneneldan/TinyStories-3M,0.06,0.1,0.13,0.2,0.01,0
🟒,roneneldan/TinyStories-1M,0.05,0.11,0.09,0.17,0.01,0
πŸ”Ά,Fredithefish/ReasonixPajama-3B-HF,0.15,0.24,0.21,0.94,0.01,0.44
🟒,mistralai/Mistral-7B-v0.1,0.54,0.51,0.46,0.75,0,0.91
πŸ”Ά,rishiraj/meow,0.11,0.49,0.28,0.36,0.02,0.95