Spaces:
Runtime error
Runtime error
Toolless (sorted by score) | |
--------------------------------------------------------- | |
Model Score P50 (s) P99 (s) | |
--------------------------------------------------------- | |
gemini-2.5-pro 8/20 16.32 61.77 | |
gpt-5 5/20 33.78 144.80 | |
glm-4.5-air 3/20 30.33 266.54 | |
gpt-oss-120b 2/20 8.57 63.07 | |
Qwen3-235B (thinking) 2/20 50.75 152.70 | |
gemma3-4b 1/20 161.28 312.12 | |
With tools (sorted by score) | |
--------------------------------------------------------- | |
Model Score P50 (s) P99 (s) | |
--------------------------------------------------------- | |
gpt-5 16/20 270.40 990.36 | |
gemini-2.5-pro 12/20 37.22 134.73 | |
gpt-oss-120b 11/20 12.81 33.87 | |
glm-4.5-air 9/20 45.61 103.07 | |
Qwen3-235B (thinking) 6/20 111.34 226.74 | |
gemma3-4b 0/20 870.13 1900.00 | |