Update README.md
Browse files
README.md
CHANGED
|
@@ -40,15 +40,13 @@ See our [paper](https://arxiv.org/abs/2506.04178) and [blog post](https://openth
|
|
| 40 |
The numbers reported in the table below are evaluated with our open-source tool [Evalchemy](https://github.com/mlfoundations/Evalchemy).
|
| 41 |
In the table below, we bold values in each column that are within 2 standard errors of the best.
|
| 42 |
|
| 43 |
-
| Model
|
| 44 |
-
|
|
| 45 |
-
| [
|
| 46 |
-
| [
|
| 47 |
-
|
|
| 48 |
-
| [
|
| 49 |
-
| [
|
| 50 |
-
| [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1) | ✅ | 62.0 | 48.0 |**94.0**| 89.4 | 26.7 | **50.9** | 30.9 |**32.9** | 52.9 | 70.7 |
|
| 51 |
-
| [AceReason-Nemotron-7B](https://huggingface.co/nvidia/AceReason-Nemotron-7B) | ✅ |**71.0**| 50.7 |**93.8**| 89.8 | 33.3 | 44.3 |**32.9** |**30.9** | 52.9 | 64.3 |
|
| 52 |
|
| 53 |
# Data
|
| 54 |
|
|
|
|
| 40 |
The numbers reported in the table below are evaluated with our open-source tool [Evalchemy](https://github.com/mlfoundations/Evalchemy).
|
| 41 |
In the table below, we bold values in each column that are within 2 standard errors of the best.
|
| 42 |
|
| 43 |
+
| Model | AIME24 | AIME25 | AMC23 | MATH500 | HMMT O2/25 | LCB 06/24-01/25 | CodeElo | CodeForces | GPQA-D | JEEBench |
|
| 44 |
+
| ------------------------------------------------------------------------------------------------------------ | ------ | ------ | ------ | ------- | ---------- | --------------- | ------- | ---------- | ------ | -------- |
|
| 45 |
+
| **[OpenThinker3-1.5B](https://huggingface.co/open-thoughts/OpenThinker3-1.5B)** |**52.0**|**41.7**|**87.0**| 86.4 | **27.3** | **39.4** | 12.9 | 15.5 | 29.5 | 51.9 |
|
| 46 |
+
| [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) | 32.3 | 23.7 | 71.8 | 80.8 | 15.3 | 27.2 | 8.8 | 8.5 | 31.1 | 32.5 |
|
| 47 |
+
| [Nemotron-Research-Reasoning-Qwen-1.5B](https://huggingface.co/nvidia/Nemotron-Research-Reasoning-Qwen-1.5B) |**47.7**| 32.0 |**87.5**| 86.0 | 21.7 | 31.4 |**54.7** |**40.3** | 41.8 | 52.6 |
|
| 48 |
+
| [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) |**52.0**| 35.3 | 83.8 | **87.2**| 23.3 | 27.7 | 20.7 | 20.0 |**49.3**|**60.7** |
|
| 49 |
+
| [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) | 3.0 | 0.7 | 30.8 | 50.2 | 0.0 | 5.5 | 0.8 | 2.2 | 24.7 | 16.4 |
|
|
|
|
|
|
|
| 50 |
|
| 51 |
# Data
|
| 52 |
|