open-thoughts
/

OpenThinker3-1.5B

@@ -40,15 +40,13 @@ See our [paper](https://arxiv.org/abs/2506.04178) and [blog post](https://openth
 The numbers reported in the table below are evaluated with our open-source tool [Evalchemy](https://github.com/mlfoundations/Evalchemy).
 In the table below, we bold values in each column that are within 2 standard errors of the best.
-| Model                                                                                           | Data  | AIME24 | AIME25 |  AMC23 | MATH500 | HMMT O2/25 | LCB 06/24-01/25 | CodeElo | CodeForces | GPQA-D | JEEBench |
-| ----------------------------------------------------------------------------------------------- | ----- | ------ | ------ | ------ | ------- | ---------- | --------------- | ------- | ---------- | ------ | -------- |
-| [OpenThinker-7B](https://huggingface.co/open-thoughts/OpenThinker-7B)                           | ✅    |  30.7  |  22.0  |  72.5  |   82.8  |   15.7     |    26.1         |  11.1   |  14.9      |  38.6  |  45.3    |
-| [OpenThinker2-7B](https://huggingface.co/open-thoughts/OpenThinker2-7B)                         | ✅    |  60.7  |  38.7  |  89.8  |   87.6  |   24.7     |    40.6         |  22.8   |  26.6      |  47.0  |  65.1    |
-| **[OpenThinker3-7B](https://huggingface.co/open-thoughts/OpenThinker3-7B)**                     | ✅    |**69.0**|**53.3**|**93.5**| **90.0**|   **42.7** |    **51.7**     |  31.0   |**32.2**    |  53.7  |**72.4**  |
-| [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | ❌    |  51.3  |  38.0  |  92.0  |   88.0  |   25.0     |    34.5         |  19.9   |  21.1      |  33.2  |  50.4    |
-| [OpenR1-Distill-7B](https://huggingface.co/open-r1/OpenR1-Distill-7B)                           | ✅    |  57.7  |  39.7  |  87.0  |   88.0  |   25.7     |    30.7         |  30.1   |  29.3      |**58.9**|  68.7    |
-| [Llama-3.1-Nemotron-Nano-8B-v1](https://huggingface.co/nvidia/Llama-3.1-Nemotron-Nano-8B-v1)    | ✅    |  62.0  |  48.0  |**94.0**|   89.4  |   26.7     |    **50.9**     |  30.9   |**32.9**    |  52.9  |  70.7    |
-| [AceReason-Nemotron-7B](https://huggingface.co/nvidia/AceReason-Nemotron-7B)                    | ✅    |**71.0**|  50.7  |**93.8**|   89.8  |   33.3     |    44.3         |**32.9** |**30.9**    |  52.9  |  64.3    |
 # Data

 The numbers reported in the table below are evaluated with our open-source tool [Evalchemy](https://github.com/mlfoundations/Evalchemy).
 In the table below, we bold values in each column that are within 2 standard errors of the best.
+| Model                                                                                                        | AIME24 | AIME25 |  AMC23 | MATH500 | HMMT O2/25 | LCB 06/24-01/25 | CodeElo | CodeForces | GPQA-D | JEEBench |
+| ------------------------------------------------------------------------------------------------------------ | ------ | ------ | ------ | ------- | ---------- | --------------- | ------- | ---------- | ------ | -------- |
+| **[OpenThinker3-1.5B](https://huggingface.co/open-thoughts/OpenThinker3-1.5B)**                              |**52.0**|**41.7**|**87.0**|   86.4  | **27.3**   |  **39.4**       |  12.9   |  15.5      |  29.5  |  51.9    |
+| [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)            |  32.3  |  23.7  |  71.8  |   80.8  |   15.3     |    27.2         |  8.8    |  8.5       |  31.1  |  32.5    |
+| [Nemotron-Research-Reasoning-Qwen-1.5B](https://huggingface.co/nvidia/Nemotron-Research-Reasoning-Qwen-1.5B) |**47.7**|  32.0  |**87.5**|   86.0  |   21.7     |    31.4         |**54.7** |**40.3**    |  41.8  |  52.6    |
+| [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)                                                         |**52.0**|  35.3  |  83.8  | **87.2**|   23.3     |    27.7         |  20.7   |  20.0      |**49.3**|**60.7**  |
+| [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)                                   |  3.0   |  0.7   |  30.8  |   50.2  |   0.0      |    5.5          |  0.8    |   2.2      |  24.7  |  16.4    |
 # Data