abtsousa commited on
Commit
2cfb4c4
·
1 Parent(s): 0242ef6

Final results

Browse files
Files changed (1) hide show
  1. results.txt +22 -0
results.txt ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Toolless (sorted by score)
2
+ ---------------------------------------------------------
3
+ Model Score P50 (s) P99 (s)
4
+ ---------------------------------------------------------
5
+ gemini-2.5-pro 8/20 16.32 61.77
6
+ gpt-5 5/20 33.78 144.80
7
+ glm-4.5-air 3/20 30.33 266.54
8
+ gpt-oss-120b 2/20 8.57 63.07
9
+ Qwen3-235B (thinking) 2/20 50.75 152.70
10
+ gemma3-4b 1/20 161.28 312.12
11
+
12
+
13
+ With tools (sorted by score)
14
+ ---------------------------------------------------------
15
+ Model Score P50 (s) P99 (s)
16
+ ---------------------------------------------------------
17
+ gpt-5 16/20 270.40 990.36
18
+ gemini-2.5-pro 12/20 37.22 134.73
19
+ gpt-oss-120b 11/20 12.81 33.87
20
+ glm-4.5-air 9/20 45.61 103.07
21
+ Qwen3-235B (thinking) 6/20 111.34 226.74
22
+ gemma3-4b 0/20 870.13 1900.00