sedrickkeh commited on
Commit
0ee90a3
·
verified ·
1 Parent(s): 8d156e5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -9
README.md CHANGED
@@ -40,15 +40,13 @@ See our [paper](https://arxiv.org/abs/2506.04178) and [blog post](https://openth
40
  The numbers reported in the table below are evaluated with our open-source tool [Evalchemy](https://github.com/mlfoundations/Evalchemy).
41
  In the table below, we bold values in each column that are within 2 standard errors of the best.
42
 
43
- | Model | Data | AIME24 | AIME25 | AMC23 | MATH500 | HMMT O2/25 | LCB 06/24-01/25 | CodeElo | CodeForces | GPQA-D | JEEBench |
44
- | ----------------------------------------------------------------------------------------------- | ----- | ------ | ------ | ------ | ------- | ---------- | --------------- | ------- | ---------- | ------ | -------- |
45
- | [OpenThinker-7B](https://huggingface.co/open-thoughts/OpenThinker-7B) | ✅ | 30.7 | 22.0 | 72.5 | 82.8 | 15.7 | 26.1 | 11.1 | 14.9 | 38.6 | 45.3 |
46
- | [OpenThinker2-7B](https://huggingface.co/open-thoughts/OpenThinker2-7B) | ✅ | 60.7 | 38.7 | 89.8 | 87.6 | 24.7 | 40.6 | 22.8 | 26.6 | 47.0 | 65.1 |
47
- | **[OpenThinker3-7B](https://huggingface.co/open-thoughts/OpenThinker3-7B)** | |**69.0**|**53.3**|**93.5**| **90.0**| **42.7** | **51.7** | 31.0 |**32.2** | 53.7 |**72.4** |
48
- | [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | ❌ | 51.3 | 38.0 | 92.0 | 88.0 | 25.0 | 34.5 | 19.9 | 21.1 | 33.2 | 50.4 |
49
- | [OpenR1-Distill-7B](https://huggingface.co/open-r1/OpenR1-Distill-7B) | ✅ | 57.7 | 39.7 | 87.0 | 88.0 | 25.7 | 30.7 | 30.1 | 29.3 |**58.9**| 68.7 |
50
- | [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1) | ✅ | 62.0 | 48.0 |**94.0**| 89.4 | 26.7 | **50.9** | 30.9 |**32.9** | 52.9 | 70.7 |
51
- | [AceReason-Nemotron-7B](https://huggingface.co/nvidia/AceReason-Nemotron-7B) | ✅ |**71.0**| 50.7 |**93.8**| 89.8 | 33.3 | 44.3 |**32.9** |**30.9** | 52.9 | 64.3 |
52
 
53
  # Data
54
 
 
40
  The numbers reported in the table below are evaluated with our open-source tool [Evalchemy](https://github.com/mlfoundations/Evalchemy).
41
  In the table below, we bold values in each column that are within 2 standard errors of the best.
42
 
43
+ | Model | AIME24 | AIME25 | AMC23 | MATH500 | HMMT O2/25 | LCB 06/24-01/25 | CodeElo | CodeForces | GPQA-D | JEEBench |
44
+ | ------------------------------------------------------------------------------------------------------------ | ------ | ------ | ------ | ------- | ---------- | --------------- | ------- | ---------- | ------ | -------- |
45
+ | **[OpenThinker3-1.5B](https://huggingface.co/open-thoughts/OpenThinker3-1.5B)** |**52.0**|**41.7**|**87.0**| 86.4 | **27.3** | **39.4** | 12.9 | 15.5 | 29.5 | 51.9 |
46
+ | [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) | 32.3 | 23.7 | 71.8 | 80.8 | 15.3 | 27.2 | 8.8 | 8.5 | 31.1 | 32.5 |
47
+ | [Nemotron-Research-Reasoning-Qwen-1.5B](https://huggingface.co/nvidia/Nemotron-Research-Reasoning-Qwen-1.5B) |**47.7**| 32.0 |**87.5**| 86.0 | 21.7 | 31.4 |**54.7** |**40.3** | 41.8 | 52.6 |
48
+ | [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B) |**52.0**| 35.3 | 83.8 | **87.2**| 23.3 | 27.7 | 20.7 | 20.0 |**49.3**|**60.7** |
49
+ | [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) | 3.0 | 0.7 | 30.8 | 50.2 | 0.0 | 5.5 | 0.8 | 2.2 | 24.7 | 16.4 |
 
 
50
 
51
  # Data
52