MageBench-Leaderboard / test-output.json
daiqi's picture
Add {'Score': '867', 'Name': 'gbhd', 'BaseModel': 'bdfb', 'Env': 'Sokoban', 'Target-research': 'Model-Eval-Online', 'Subset': 'mini', 'Link': 'fdns', 'State': 'Checking'} to checking queue
5b280bc verified
raw
history blame
166 Bytes
{"Score": "867", "Name": "gbhd", "BaseModel": "bdfb", "Env": "Sokoban", "Target-research": "Model-Eval-Online", "Subset": "mini", "Link": "fdns", "State": "Checking"}