Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Commit
·
0d3e3cf
1
Parent(s):
99f5740
updated copy
Browse files- data_loader.py +2 -2
data_loader.py
CHANGED
|
@@ -1057,7 +1057,7 @@ METHODOLOGY = """
|
|
| 1057 |
</tr>
|
| 1058 |
<tr>
|
| 1059 |
<td>Reasoning Models</td>
|
| 1060 |
-
<td>Although being great for reasoning, o1 and o3-mini are far from perfect scoring 0.87 and 0.84 respectively.
|
| 1061 |
</tr>
|
| 1062 |
<tr>
|
| 1063 |
<td>Tool Miss Detection</td>
|
|
@@ -1095,7 +1095,7 @@ METHODOLOGY = """
|
|
| 1095 |
</tr>
|
| 1096 |
<tr>
|
| 1097 |
<td>Reasoning Models</td>
|
| 1098 |
-
<td>While o1 and o3-mini excelled in function calling,
|
| 1099 |
</tr>
|
| 1100 |
<tr>
|
| 1101 |
<td>Safety Controls</td>
|
|
|
|
| 1057 |
</tr>
|
| 1058 |
<tr>
|
| 1059 |
<td>Reasoning Models</td>
|
| 1060 |
+
<td>Although being great for reasoning, o1 and o3-mini are far from perfect scoring 0.87 and 0.84 respectively. R1 is excluded from rankings due to limited function support</td>
|
| 1061 |
</tr>
|
| 1062 |
<tr>
|
| 1063 |
<td>Tool Miss Detection</td>
|
|
|
|
| 1095 |
</tr>
|
| 1096 |
<tr>
|
| 1097 |
<td>Reasoning Models</td>
|
| 1098 |
+
<td>While o1 and o3-mini excelled in function calling, R1 is excluded from rankings due to limited function support</td>
|
| 1099 |
</tr>
|
| 1100 |
<tr>
|
| 1101 |
<td>Safety Controls</td>
|