CulturalBench / data /CulturalBench_Hard_average.csv
kellycyy's picture
first init
c76f18f
raw
history blame
726 Bytes
model,accuracy
gpt-3-5-turbo-1106,26.97636512
gpt35turbo,32.76283619
gpt4omini,48.32925835
gpt4o,61.45069275
gpt-4o-2024-08-06,62.02118989
gpt-4-0125-preview,61.61369193
gpt-4-1106-preview,59.8207009
haiku,25.34637327
sonnet3,34.47432763
opus,52.89323553
sonnet35,51.50774246
mistralnemo,37.73431133
mistralsmall,43.27628362
mistral-large-2402,48.65525672
mistrallarge,43.92828036
gemini-1-5-flash,45.31377343
llama3-8b,21.43439283
llama3-70b,30.399348
llama3-1-8b,37.97881011
llama3-1-70b,54.19722901
llama3-1-405b,52.56723716
gemma2-9b,38.95680522
gemma2-27b,43.03178484
mistral-7b-v1,19.5599022
mistral-7b-v2,34.800326
mixtral-8x22B,44.25427873
qwen1-5-72b-chat,44.74327628
qwen2-72b,48.65525672