Update README.md
Browse filesadded citation for olmes.
README.md
CHANGED
|
@@ -206,9 +206,8 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
| 206 |
|
| 207 |
**Evaluation Results:**
|
| 208 |
<table>
|
| 209 |
-
|
| 210 |
<thead>
|
| 211 |
-
<caption style="text-align:center"><b>Comparison with different models over various benchmarks. Scores of AlpacaEval-2.0 and Arena-Hard are calculated with thinking=True</b></caption>
|
| 212 |
<tr>
|
| 213 |
<th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
|
| 214 |
<th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
|
|
@@ -222,53 +221,53 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
| 222 |
<th style="text-align:center; background-color: #001d6c; color: white;">HumanEval</th>
|
| 223 |
<th style="text-align:center; background-color: #001d6c; color: white;">HumanEval+</th>
|
| 224 |
<th style="text-align:center; background-color: #001d6c; color: white;">IFEval</th>
|
| 225 |
-
<th style="text-align:center; background-color: #001d6c; color: white;">
|
| 226 |
</tr></thead>
|
| 227 |
<tbody>
|
| 228 |
<tr>
|
| 229 |
-
<td style="text-align:left; background-color: #
|
| 230 |
-
<td style="text-align:center; background-color: #
|
| 231 |
-
<td style="text-align:center; background-color: #
|
| 232 |
-
<td style="text-align:center; background-color: #
|
| 233 |
-
<td style="text-align:center; background-color: #
|
| 234 |
-
<td style="text-align:center; background-color: #
|
| 235 |
-
<td style="text-align:center; background-color: #
|
| 236 |
-
<td style="text-align:center; background-color: #
|
| 237 |
-
<td style="text-align:center; background-color: #
|
| 238 |
-
<td style="text-align:center; background-color: #
|
| 239 |
-
<td style="text-align:center; background-color: #
|
| 240 |
-
<td style="text-align:center; background-color: #
|
| 241 |
-
<td style="text-align:center; background-color: #
|
| 242 |
</tr>
|
| 243 |
<tr>
|
| 244 |
-
<td style="text-align:left; background-color: #
|
| 245 |
-
<td style="text-align:center; background-color: #
|
| 246 |
-
<td style="text-align:center; background-color: #
|
| 247 |
-
<td style="text-align:center; background-color: #
|
| 248 |
-
<td style="text-align:center; background-color: #
|
| 249 |
-
<td style="text-align:center; background-color: #
|
| 250 |
-
<td style="text-align:center; background-color: #
|
| 251 |
-
<td style="text-align:center; background-color: #
|
| 252 |
-
<td style="text-align:center; background-color: #
|
| 253 |
-
<td style="text-align:center; background-color: #
|
| 254 |
-
<td style="text-align:center; background-color: #
|
| 255 |
-
<td style="text-align:center; background-color: #
|
| 256 |
-
<td style="text-align:center; background-color: #
|
| 257 |
</tr>
|
| 258 |
<tr>
|
| 259 |
-
<td style="text-align:left; background-color: #
|
| 260 |
-
<td style="text-align:center; background-color: #
|
| 261 |
-
<td style="text-align:center; background-color: #
|
| 262 |
-
<td style="text-align:center; background-color: #
|
| 263 |
-
<td style="text-align:center; background-color: #
|
| 264 |
-
<td style="text-align:center; background-color: #
|
| 265 |
-
<td style="text-align:center; background-color: #
|
| 266 |
-
<td style="text-align:center; background-color: #
|
| 267 |
-
<td style="text-align:center; background-color: #
|
| 268 |
-
<td style="text-align:center; background-color: #
|
| 269 |
-
<td style="text-align:center; background-color: #
|
| 270 |
-
<td style="text-align:center; background-color: #
|
| 271 |
-
<td style="text-align:center; background-color: #
|
| 272 |
</tr>
|
| 273 |
|
| 274 |
<tr>
|
|
@@ -285,7 +284,6 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
| 285 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">80.15</td>
|
| 286 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">79.10</td>
|
| 287 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">83.43</td>
|
| 288 |
-
|
| 289 |
</tr>
|
| 290 |
|
| 291 |
<tr>
|
|
@@ -335,7 +333,6 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
| 335 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">59.10</td>
|
| 336 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">42.45</td>
|
| 337 |
</tr>
|
| 338 |
-
|
| 339 |
<tr>
|
| 340 |
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
|
| 341 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">37.58</td>
|
|
@@ -352,7 +349,6 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
| 352 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">85.73</td>
|
| 353 |
</tr>
|
| 354 |
|
| 355 |
-
|
| 356 |
<tr>
|
| 357 |
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
|
| 358 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">55.25</td>
|
|
@@ -395,19 +391,19 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
| 395 |
</tr></thead>
|
| 396 |
<tbody>
|
| 397 |
<tr>
|
| 398 |
-
<td style="text-align:left; background-color: #
|
| 399 |
-
<td style="text-align:center; background-color: #
|
| 400 |
-
<td style="text-align:center; background-color: #
|
| 401 |
</tr>
|
| 402 |
<tr>
|
| 403 |
-
<td style="text-align:left; background-color: #
|
| 404 |
-
<td style="text-align:center; background-color: #
|
| 405 |
-
<td style="text-align:center; background-color: #
|
| 406 |
</tr>
|
| 407 |
<tr>
|
| 408 |
-
<td style="text-align:left; background-color: #
|
| 409 |
-
<td style="text-align:center; background-color: #
|
| 410 |
-
<td style="text-align:center; background-color: #
|
| 411 |
</tr>
|
| 412 |
<tr>
|
| 413 |
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
|
|
@@ -425,9 +421,6 @@ By implementing this innovative prevention strategy, we can significantly reduce
|
|
| 425 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 69.02 </td>
|
| 426 |
</tr>
|
| 427 |
</tbody></table>
|
| 428 |
-
|
| 429 |
-
</tbody></table>
|
| 430 |
-
|
| 431 |
<!-- <table>
|
| 432 |
<caption><b>Thinking Ablation</b></caption>
|
| 433 |
<thead>
|
|
@@ -532,6 +525,9 @@ Granite-3.3-2B-Instruct builds upon Granite-3.3-2B-Base, leveraging both permiss
|
|
| 532 |
- 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
|
| 533 |
- 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
|
| 534 |
|
|
|
|
|
|
|
|
|
|
| 535 |
<!-- ## Citation
|
| 536 |
```
|
| 537 |
@misc{granite-models,
|
|
|
|
| 206 |
|
| 207 |
**Evaluation Results:**
|
| 208 |
<table>
|
|
|
|
| 209 |
<thead>
|
| 210 |
+
<caption style="text-align:center"><b>Comparison with different models over various benchmarks. Scores of AlpacaEval-2.0 and Arena-Hard are calculated with thinking=True</b><sup id="fnref1"><a href="#fn1">1</a></caption>
|
| 211 |
<tr>
|
| 212 |
<th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
|
| 213 |
<th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
|
|
|
|
| 221 |
<th style="text-align:center; background-color: #001d6c; color: white;">HumanEval</th>
|
| 222 |
<th style="text-align:center; background-color: #001d6c; color: white;">HumanEval+</th>
|
| 223 |
<th style="text-align:center; background-color: #001d6c; color: white;">IFEval</th>
|
| 224 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">AttaQ</th>
|
| 225 |
</tr></thead>
|
| 226 |
<tbody>
|
| 227 |
<tr>
|
| 228 |
+
<td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-2B-Instruct</td>
|
| 229 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">23.3</td>
|
| 230 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">27.17</td>
|
| 231 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">57.11</td>
|
| 232 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">20.55</td>
|
| 233 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">59.79</td>
|
| 234 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">54.46</td>
|
| 235 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">18.68</td>
|
| 236 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">67.55</td>
|
| 237 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">79.45</td>
|
| 238 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">75.26</td>
|
| 239 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">63.59</td>
|
| 240 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">84.7</td>
|
| 241 |
</tr>
|
| 242 |
<tr>
|
| 243 |
+
<td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.2-2B-Instruct</td>
|
| 244 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">24.86</td>
|
| 245 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">34.51</td>
|
| 246 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">57.18</td>
|
| 247 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">20.56</td>
|
| 248 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">59.8</td>
|
| 249 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">52.27</td>
|
| 250 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">21.12</td>
|
| 251 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">67.02</td>
|
| 252 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">80.13</td>
|
| 253 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">73.39</td>
|
| 254 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">61.55</td>
|
| 255 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">83.23</td>
|
| 256 |
</tr>
|
| 257 |
<tr>
|
| 258 |
+
<td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;"><b>Granite-3.3-2B-Instruct</b></td>
|
| 259 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 28.86 </td>
|
| 260 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 43.45 </td>
|
| 261 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 55.88 </td>
|
| 262 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 18.4 </td>
|
| 263 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 58.97 </td>
|
| 264 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 52.51 </td>
|
| 265 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 35.98 </td>
|
| 266 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 72.48 </td>
|
| 267 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 80.51 </td>
|
| 268 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 75.68 </td>
|
| 269 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 65.8 </td>
|
| 270 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;">87.47</td>
|
| 271 |
</tr>
|
| 272 |
|
| 273 |
<tr>
|
|
|
|
| 284 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">80.15</td>
|
| 285 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">79.10</td>
|
| 286 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">83.43</td>
|
|
|
|
| 287 |
</tr>
|
| 288 |
|
| 289 |
<tr>
|
|
|
|
| 333 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">59.10</td>
|
| 334 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">42.45</td>
|
| 335 |
</tr>
|
|
|
|
| 336 |
<tr>
|
| 337 |
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
|
| 338 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">37.58</td>
|
|
|
|
| 349 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">85.73</td>
|
| 350 |
</tr>
|
| 351 |
|
|
|
|
| 352 |
<tr>
|
| 353 |
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
|
| 354 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">55.25</td>
|
|
|
|
| 391 |
</tr></thead>
|
| 392 |
<tbody>
|
| 393 |
<tr>
|
| 394 |
+
<td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.1-2B-Instruct</td>
|
| 395 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 0.89 </td>
|
| 396 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 35.07 </td>
|
| 397 |
</tr>
|
| 398 |
<tr>
|
| 399 |
+
<td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;">Granite-3.2-2B-Instruct</td>
|
| 400 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 0.89 </td>
|
| 401 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 35.54 </td>
|
| 402 |
</tr>
|
| 403 |
<tr>
|
| 404 |
+
<td style="text-align:left; background-color: #FFFFFF; color: #2D2D2D;"><b>Granite-3.3-2B-Instruct</b></td>
|
| 405 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 3.28 </td>
|
| 406 |
+
<td style="text-align:center; background-color: #FFFFFF; color: #2D2D2D;"> 58.09 </td>
|
| 407 |
</tr>
|
| 408 |
<tr>
|
| 409 |
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
|
|
|
|
| 421 |
<td style="text-align:center; background-color: #DAE8FF; color: black;"> 69.02 </td>
|
| 422 |
</tr>
|
| 423 |
</tbody></table>
|
|
|
|
|
|
|
|
|
|
| 424 |
<!-- <table>
|
| 425 |
<caption><b>Thinking Ablation</b></caption>
|
| 426 |
<thead>
|
|
|
|
| 525 |
- 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
|
| 526 |
- 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources
|
| 527 |
|
| 528 |
+
|
| 529 |
+
|
| 530 |
+
<p><a href="#fnref1" title="Jump back to reference">[1]</a> Evaluated using <a href="https://github.com/allenai/olmes">OLMES</a> (except the AttaQ scores)</p>
|
| 531 |
<!-- ## Citation
|
| 532 |
```
|
| 533 |
@misc{granite-models,
|