Spaces:
Runtime error
Runtime error
abtsousa
commited on
Commit
·
2cfb4c4
1
Parent(s):
0242ef6
Final results
Browse files- results.txt +22 -0
results.txt
ADDED
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Toolless (sorted by score)
|
2 |
+
---------------------------------------------------------
|
3 |
+
Model Score P50 (s) P99 (s)
|
4 |
+
---------------------------------------------------------
|
5 |
+
gemini-2.5-pro 8/20 16.32 61.77
|
6 |
+
gpt-5 5/20 33.78 144.80
|
7 |
+
glm-4.5-air 3/20 30.33 266.54
|
8 |
+
gpt-oss-120b 2/20 8.57 63.07
|
9 |
+
Qwen3-235B (thinking) 2/20 50.75 152.70
|
10 |
+
gemma3-4b 1/20 161.28 312.12
|
11 |
+
|
12 |
+
|
13 |
+
With tools (sorted by score)
|
14 |
+
---------------------------------------------------------
|
15 |
+
Model Score P50 (s) P99 (s)
|
16 |
+
---------------------------------------------------------
|
17 |
+
gpt-5 16/20 270.40 990.36
|
18 |
+
gemini-2.5-pro 12/20 37.22 134.73
|
19 |
+
gpt-oss-120b 11/20 12.81 33.87
|
20 |
+
glm-4.5-air 9/20 45.61 103.07
|
21 |
+
Qwen3-235B (thinking) 6/20 111.34 226.74
|
22 |
+
gemma3-4b 0/20 870.13 1900.00
|