PCL-Reasoner commited on
Commit
39d1d63
·
verified ·
1 Parent(s): 2d276c8

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +164 -232
  2. README_CN.md +403 -0
README.md CHANGED
@@ -1,108 +1,70 @@
1
- <style>
2
- /* 全局字体与间距 */
3
- body {
4
- font-family: 'Segoe UI', sans-serif;
5
- line-height: 1.75; /* 行间距1.75倍 */
6
- color: #333
7
- margin: 0 auto; /* 内容居中 */
8
- }
9
-
10
- /* 段间距 */
11
- p, ul, ol, blockquote {
12
- margin-bottom: 1.25em; /* 段落/列表/引用间距 */
13
- }
14
-
15
-
16
- /* 代码块 */
17
- pre {
18
- background-color:rgb(146, 150, 153);
19
- border-radius: 6px;
20
- padding: 16px;
21
- overflow: auto;
22
- line-height: 1.45; /* 代码行距 */
23
- }
24
-
25
- /* 行内代码 */
26
- code {
27
- background: rgb(146, 150, 153);
28
- padding: 2px 6px;
29
- border-radius: 3px;
30
- font-family: 'Fira Code', monospace;
31
- }
32
- </style>
33
-
34
- # **PCL-Reasoner-V1模型**
35
-
36
- ## 模型概览
37
-
38
- 本次发布的PCL-Reasoner-V1模型,以Qwen2.5-32B-Base为起点,基于昇思框架与昇腾硬件进行了高性能的监督微调。经过微调,模型在数学推理能力上取得了显著提升:其在权威基准评测集AIME24上准确率达85.7%,AIME25上达84.2%,在32B参数级别模型中稳居前列。
39
-
40
- 为促进技术共享与应用,我们已完整开源了PCL-Reasoner-V1的模型权重、微调数据及训练代码。该模型不仅是当下领先的32B数学推理模型之一,更为开发者提供了宝贵的专业领域监督微调实践经验与后训练解决方案。用户可参照以下教程轻松部署体验,深入探索后训练的实践方法与奥秘!
41
-
42
- ![eval_results](images/README/eval_results.png)
43
-
44
- ## 开发指导
45
-
46
- ### 1. 模型文件
47
-
48
- PCL-Reasoner-V1基于Qwen2.5-32B-Base进行微调后训练,训练流程基于MindFormers实现,主要涉及的文件有:
49
-
50
- 数据处理:
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  ```
53
  pcl_reasoner_v1
54
  ├── qwen2_5_tokenizer.py # qwen2_5 tokenizer
55
- ├── packing_handler.py # 数据packing处理
56
  └── data_preprocess
57
- ├── decontaminate.py # 数据污染检测
58
- └── dataset_prehandle_and_split.py # 数据拆分及预处理
59
  ```
60
 
61
- 模型配置:
62
-
63
  ```
64
  pcl_reasoner_v1/config
65
- ├── data_process_handling.yaml # 数据格式转换配置文件
66
- ├── data_process_packing.yaml # 数据拼接配置文件
67
- └── finetune_pcl_reasoner_v1_32k.yaml # 模型微调配置文件
68
  ```
69
 
70
- 任务启动脚本:
71
-
72
  ```
73
  pcl_reasoner_v1
74
- └── run_pcl_reasoner_v1_finetune.sh # 模型微调启动脚本
75
  ```
76
 
77
-
78
- ### 2.环境及数据准备
79
-
80
- #### 2.1 安装环境
81
 
82
- | 软件| 版本 |
83
- | --- | --- |
84
- | 固件&驱动| 24.1.rc3.5 |
85
- | CANN| 7.7.T9.0.B057:8.1.RC1 |
86
- | Python | 3.10 |
87
- | MindSpore | 2.6.0 |
 
 
88
  | MindSpore TransFormers | r1.5.0 |
89
 
90
- #### 2.2 数据处理
91
 
92
- ##### 2.2.1 数据集下载
93
 
94
- 用户可以从HuggingFace官方下载原始数据集:
95
 
96
- | 数据集名称 | 数据集链接 |
97
  | ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
98
  | AM-DeepSeek-R1-0528-Distilled | [https://huggingface.co/a-m-team/AM-DeepSeek-R1-0528-Distilled](https://huggingface.co/a-m-team/AM-DeepSeek-R1-0528-Distilled) |
99
 
100
- ##### 2.2.2 数据预处理
 
 
101
 
102
- 首先,我们对源数据进行检测和筛选,操作分为两个步骤,验证集污染检测与数据筛选。
 
103
 
104
- * 验证集污染检测:我们采用基于all-MiniLM-L6-v2模型计算文本余弦相似度的方法,对数学部分原始数据针对AIME24/25评测集进行污染检测。该脚本执行后会在终端打印检测结果,并在指定的输出路径中保存相似度大于阈值的题目及其匹配的评测集题目。
105
-
106
  ```
107
  python PCL-Reasoner-V1/pcl_reasoner_v1/data_preprocess/decontaminate.py \
108
  --target_data /path/to/target_data \
@@ -110,157 +72,146 @@ pcl_reasoner_v1
110
  --model_path /path/to/distilled/model_path \
111
  --output_file_prefix /path/to/output_file_prefix
112
  --threshold 0.7
113
-
114
- # 参数说明
115
- target_data:需要被检测的数据
116
- contaminant_source:污染源,即评测集数据
117
- model_path:计算文本嵌入的模型
118
- output_file_prefix:检测结果输出的路径
119
- threshold:相似度阈值
120
  ```
121
- * 数据筛选及处理:运行数据处理脚本,进行数据长度筛选,选取问题加思维链长度小于32K tokens的数据,并将提示词添加到数据中。
122
-
123
  ```
124
  python PCL-Reasoner-V1/pcl_reasoner_v1/data_preprocess/convert_and_split_dataset.py \
125
  --json_file_paths /path/to/AM-DeepSeek-R1-0528-Distilled/math.jsonl
126
-
127
- # 参数说明
128
- json_file_paths:需要处理的数据集,支持传入多个路径,用空格分隔
129
  ```
130
 
131
- 其次,我们将数据转换成packing格式,操作分为两个步骤,格式转换与数据拼接。
 
 
132
 
133
- * 格式转换:在配置文件`pcl_reasoner_v1/config/data_process_handling.yaml`中指定`data_files`、`vocab_file`、`merges_file`等文件路径,指定`pcl_reasoner_v1/packing_handler.py`文件中自定义的`AMDeepSeekDataHandler`为数据handler:
134
-
135
  ```
136
  train_dataset:
137
  ...
138
- path: "json" # 原始数据集文件格式
139
  data_files:
140
- ["/path/to/data.jsonl"] # 原始数据集路径
141
  input_columns: *input_columns
142
  handler:
143
- - type: AMDeepSeekDataHandler # 指定自定义的数据处理类
144
  ...
145
  tokenizer:
146
  auto_register: qwen2_5_tokenizer.Qwen2Tokenizer
147
  ...
148
- vocab_file: "/path/to/vocab.json" # Qwen2_5默认tokenizer文件
149
- merges_file: "/path/to/merges.txt" # Qwen2_5默认tokenizer文件
150
  ...
151
  ```
152
-
153
- *(注意事项:以上模型配置为示例,仅列出用户高频修改的配置项,完整配置文件见代码仓)*
154
-
155
- 运���数据处理脚本,生成Arrow格式数据文件:
156
-
157
  ```
158
  export PYTHONPATH=/path/to/mindformers/:PYTHONPATH
159
- python th/to/mindformers/toolkit/data_preprocess/huggingface/datasets_preprocess.py
160
- --config ./pcl_reasoner_v1/config/data_process_handling.yaml
161
- --save_path /path/to/handled_data/
162
  --register_path ./pcl_reasoner_v1/
163
-
164
- # 参数说明
165
- config:数据格式转换的配置文件路径
166
- save_path:转换后数据集的保存文件夹路径
167
- register_path:自定义数据Handler注册目录路径
168
  ```
169
- * 数据拼接:
170
-
171
- 在配置文件pcl_reasoner_v1/config/data_process_packing.yaml指定packing后数据的存储路径:
172
-
173
  ```
174
  # dataset
175
  train_dataset:
176
  data_loader:
177
  ...
178
- path: /path/to/handled_data #预处理后数据集的路径
179
  ...
180
  ```
181
-
182
- *(注意事项:以上模型配置为示例,仅列出用户高频修改的配置项,完整配置文件见代码仓)*
183
-
184
- 运行数据packing脚本,生成packing后数据文件:
185
-
186
  ```
187
  export PYTHONPATH=/path/to/mindformers/:PYTHONPATH
188
- python /path/to/mindformers/toolkit/data_preprocess/huggingface/datasets_preprocess.py
189
- --config ./pcl_reasoner_v1_config/data_process_packing.yaml
190
- --save_path /path/to/packed_data/
191
  --register_path ./pcl_reaoner_v1/
192
-
193
- # 参数说明
194
- config:数据拼接的配置文件路径
195
- save_path:拼接后数据集的保存文件夹路径
196
- register_path:自定义数据Handler注册目录路径
197
  ```
198
 
199
-
200
- ### 3 训练流程
201
- #### 3.1 权重准备
202
 
203
- 用户可以从HuggingFace官方下载预训练权重:
 
 
 
204
 
205
- | 模型名称 | 权重链接 |
206
  | ------------------- | --------------------------------------------------------------------------------- |
207
  | Qwen2.5-32B-Base | [https://huggingface.co/Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) |
208
 
209
- MindFormers 1.5.0及以上版本已支持safetensors格式的权重直接加载及保存,无需转换成ckpt,下文中微调将使用safetensors格式权重运行。
210
-
211
- #### 3.2 训练配置
212
-
213
- 下面仅列出用户高频修改的配置项,完整配置文件见`pcl_reasoner_v1/config/finetune_pcl_reasoner_v1_32k.yaml`
214
-
215
- 基本配置:
216
-
217
- ```
218
- run_mode: 'finetune' # 设置训练模式为“finetune”
219
- load_checkpoint: '/path/to/Qwen-32B-base/' # 权重文件路径
220
- load_ckpt_format: 'safetensors' # 设置权重格式为“safetensors”
221
- auto_trans_ckpt: True # 设置在线权重切分至分布式权重
222
- ```
223
 
224
- 数据集配置:
 
225
 
 
 
 
 
 
 
226
  ```
 
 
227
  train_dataset: &train_dataset
228
-  
229
-   data_loader:
230
-     type: CommonDataLoader
231
-  
232
-     # offline
233
-     path: "/path/to/dataset/pack_data_lt_32K_full" # 数据文件路径
234
-     load_func: 'load_from_disk' # 设置数据���载方式为“load_from_disk”
235
-    
236
-     shuffle: True # 数据打乱功能使能
237
-     packing: pack # 数据格式为pack
238
-     adaptor_config:
239
-       compress_mask: True
240
-     mock_config:
241
-       seq_length: 32768 # 数据pack后长度为32k
242
-       size: 25909 # 数据集大小/数据并行切分
243
- ```
244
-
245
- 并行配置:
246
-
247
  ```
 
 
248
  parallel_config:
249
-   data_parallel: &dp 8 # 数据并行切分为8
250
-   model_parallel: 8 # 模型并行切分为8
251
-   pipeline_stage: 2 # 流水线并行切分为2
252
-   use_seq_parallel: True # 序列并行使能
253
-   optimizer_shard: True  # 优化器并行使能
254
-   micro_batch_num: 16 # micro bathsize设置为16
255
  ```
 
256
 
257
- > *(注意事项:以上模型配置为示例,仅列出用户高频修改的配置项,完整配置文件见代码仓)*
258
-
259
- #### 3.3 启动微调
260
 
261
- 在启动脚本`run_pcl_reasoner_v1_finetune.sh`指定配置文件`pcl_reasoner_v1/config/finetune_pcl_reasoner_v1_32k.yaml`,并根据用户的实际情况对卡数、服务器IP等配置进行修改:
262
 
263
- ```
264
  noderank=$1
265
 
266
  bash /path/to/mindformers/scripts/msrun_launcher.sh "run_mindformer.py \
@@ -276,61 +227,52 @@ bash /path/to/mindformers/scripts/msrun_launcher.sh "run_mindformer.py \
276
  --cluster_time_out 1200 \
277
  > run.log 2>&1
278
 
279
- # 参数说明
280
- config:配置文件路径
281
- run_mode:运行模式(预训练/微调/推理)
282
- worker_num 总卡数
283
- local_worker_num 单机的卡数
284
- master_addr:主节点地址
285
- master_port: 主节点端口
286
- log_dir: 日志路径
287
- join:是否等待所有worker退出
288
- cluster_time_out:集群等待时间
289
  ```
290
-
291
- 然后,使用`bash run_pcl_reasoner_v1_finetune.sh`指令启动微调训练,在多个节点上启动时,需指定`node_rank`(以下指令以0节点为示例):
292
-
293
  ```
294
  bash run_pcl_reasoner_v1_finetune.sh 0
295
  ```
 
296
 
297
- 在拉起任务后,通过以下指令查看运行日志:
298
-
299
  ```
300
- tail -f path/to/log/worker_127.log
301
  ```
302
-
303
 
304
- ### 4. 评测流程
305
-
306
- 为了保障评测结果的公平性,我们采用了QwQ开源的评测代码(QwQ/eval at main · QwenLM/QwQ),可以根据代码仓中README.md指导进行环境安装及模型评测。
307
 
308
- 我们采用的评测超参如下所示:
309
 
310
- | 采样超参 | 取值 |
311
- | ---------------- | --------------------------------------------- |
312
- | temperature | 0.6 |
313
- | top\_k | 40 |
314
- | top\_p | 0.95 |
315
- | max\_tokens | 129024 |
316
- | chat\_template |`./pcl_reasoner_v1/eval/am_thinking.jinja` |
317
 
318
- 我们在AIME24/AIME25评测结果详见下表数据。为确保评估准确性,我们��用Avg@32指标(平均32次采样)进行了评测:
 
 
 
 
 
 
319
 
 
 
320
 
321
- <!-- 表格基础样式(可选添加) -->
322
 
323
- <style>
324
- table { border-collapse: collapse; width: 100%; margin-left: auto;margin-right: auto;}
325
- th, td { border: 1px solid #ddd; padding: 8px; text-align: center; }
326
- </style>
327
-
328
- <!-- 表格主体 -->
329
 
330
  <table>
331
  <tr>
332
- <th>参数量</th>
333
- <th>模型</th>
334
  <th>AIME 24</th>
335
  <th>AIME 25</th>
336
  </tr>
@@ -364,10 +306,6 @@ tail -f path/to/log/worker_127.log
364
  <td><span style="color:red">90.8</span></td>
365
  <td><span style="color:grey">83</span></td>
366
  </tr>
367
- <!-- 分隔行 -->
368
- <tr>
369
- <td colspan="4" style="background-color: #f8f8f8;"></td>
370
- </tr>
371
  <!-- 合并行表头 32B -->
372
  <tr>
373
  <th rowspan="7">32B</th>
@@ -405,19 +343,14 @@ tail -f path/to/log/worker_127.log
405
  </tr>
406
  </table>
407
 
408
- > *(注:模型在AIME24/25评测集上的生成结果文件已同步上传至 `pcl_reasoner_v1/eval/eval_res`目录,供开发者用于模型验证与效果比对参考)*
409
-
410
 
411
- 另外,我们也针对评测时不同模型回答长度统计正确率,可以看出AIME24/25评测集对回答长度要求较高,而且较为简单的AIME24上,64K tokens的回答长度可以满足,而较为难的AIME25上则需要回答长度长达128K tokens:
412
-
413
- <style>
414
- table { border-collapse: collapse; width: 100%; margin-left: auto;margin-right: auto;}
415
- th, td { border: 1px solid #ddd; padding: 8px; text-align: center; }
416
- </style>
417
 
418
  <table>
419
  <tr>
420
- <th>回答长度</th>
421
  <th>16K</th>
422
  <th>32K</th>
423
  <th>64K</th>
@@ -438,4 +371,3 @@ tail -f path/to/log/worker_127.log
438
  <td>84.2</td>
439
  </tr>
440
  </table>
441
-
 
1
+ # ​**PCL-Reasoner-V1 Model**​
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ ## Model Overview
4
+ We release ​**PCL-Reasoner-V1**​, a model trained based on ​**Qwen2.5-32B-Base**​ and undergoes high-performance supervised fine-tuning based on the ​**MindSpore framework**​ and ​**Ascend hardware**. After fine-tuning, the model demonstrates significant improvements in mathematical reasoning capabilities. PCL-Reasoner-V1 achieves 85.7% and 84.2 respectively on AIME 24 and AIME 25, which position PCL-Reasoner-V1 among the top-tier models in the 32B parameter class.
5
+
6
+ To promote technical collaboration and application, we have fully open-sourced model weights, dataset and training code
7
+ PCL-Reasoner-V1 not only represents a leading 32B mathematical reasoning model but also provides developers with valuable expertise in domain-specific supervised fine-tuning and post-training solutions. Follow the tutorial below to deploy and explore advanced post-training methodologies!
8
+
9
+ ![eval_results](images/README/eval_results.png)
10
+
11
+ ## Development Guide
12
+
13
+ ### 1. Model Files
14
+ PCL-Reasoner-V1 is fine-tuned from Qwen2.5-32B-Base using MindFormers. Key files include:
15
+
16
+ ​**Data Processing:​**​
17
  ```
18
  pcl_reasoner_v1
19
  ├── qwen2_5_tokenizer.py # qwen2_5 tokenizer
20
+ ├── packing_handler.py # Data packing process
21
  └── data_preprocess
22
+ ├── decontaminate.py # validation set contamination detection
23
+ └── dataset_prehandle_and_split.py # dataset prehandle and split
24
  ```
25
 
26
+ ​**Model Configuration:​**​
 
27
  ```
28
  pcl_reasoner_v1/config
29
+ ├── data_process_handling.yaml # Format conversion configuration file
30
+ ├── data_process_packing.yaml # Data packing configuration file
31
+ └── finetune_pcl_reasoner_v1_32k.yaml # Model fine-tuning configuration file
32
  ```
33
 
34
+ ​**Task Launch Script:​**​
 
35
  ```
36
  pcl_reasoner_v1
37
+ └── run_pcl_reasoner_v1_finetune.sh # Model fine-tuning launch script
38
  ```
39
 
 
 
 
 
40
 
41
+ ### 2. Environment & Data Setup
42
+ #### 2.1 Environment Installation
43
+ | Software | Version |
44
+ |----------|---------|
45
+ | Firmware & Driver | 24.1.rc3.5 |
46
+ | CANN | 7.7.T9.0.B057:8.1.RC1 |
47
+ | Python | 3.10 |
48
+ | MindSpore | 2.6.0 |
49
  | MindSpore TransFormers | r1.5.0 |
50
 
51
+ #### 2.2 Data Processing
52
 
53
+ ##### 2.2.1 Dataset Download
54
 
55
+ Users can download the original dataset from HuggingFace:
56
 
57
+ | Dataset Name | Dataset Link |
58
  | ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
59
  | AM-DeepSeek-R1-0528-Distilled | [https://huggingface.co/a-m-team/AM-DeepSeek-R1-0528-Distilled](https://huggingface.co/a-m-team/AM-DeepSeek-R1-0528-Distilled) |
60
 
61
+ ##### 2.2.2 Data Preprocessing
62
+
63
+ First, we perform detection and filtering on the source data through two steps: ​​validation set contamination detection​​ and ​​data filtering​​.
64
 
65
+ * Validation Set Contamination Detection​​: We use the ​​all-MiniLM-L6-v2​​ model to calculate text cosine similarity and detect contamination in the original mathematical data against the AIME24/25 evaluation set.
66
+ After execution, the script prints detection results in the terminal and saves questions with similarity exceeding the threshold (along with matched evaluation questions) to the specified output path.
67
 
 
 
68
  ```
69
  python PCL-Reasoner-V1/pcl_reasoner_v1/data_preprocess/decontaminate.py \
70
  --target_data /path/to/target_data \
 
72
  --model_path /path/to/distilled/model_path \
73
  --output_file_prefix /path/to/output_file_prefix
74
  --threshold 0.7
75
+
76
+ # Parameter Description
77
+ target_data: Data to be detected
78
+ contaminant_source: Contamination source (evaluation set data)
79
+ model_path: Model for text embedding calculation
80
+ output_file_prefix: Output path for results
81
+ threshold: Similarity threshold
82
  ```
83
+ * Data Filtering & Processing​​: Execute the data processing script to filter data by length, selecting data where the combined length of the question and reasoning chain is ​​<32K tokens​​, and add prompts to the data.
84
+
85
  ```
86
  python PCL-Reasoner-V1/pcl_reasoner_v1/data_preprocess/convert_and_split_dataset.py \
87
  --json_file_paths /path/to/AM-DeepSeek-R1-0528-Distilled/math.jsonl
88
+
89
+ # Parameter Description
90
+ json_file_paths: Dataset to process (multiple paths separated by spaces)
91
  ```
92
 
93
+ Then, we convert data into ​​packed format​​ through two sequential steps: format conversion and data packing.
94
+
95
+ * Format Conversion​​: Specify paths like `data_files`、`vocab_file`、`merges_file` in `pcl_reasoner_v1/config/data_process_handling.yaml`, specify the custom AMDeepSeekDataHandler from pcl_reasoner_v1/packing_handler.py as the data handler:
96
 
 
 
97
  ```
98
  train_dataset:
99
  ...
100
+ path: "json" # Original dataset format
101
  data_files:
102
+ ["/path/to/data.jsonl"] # Path to raw dataset
103
  input_columns: *input_columns
104
  handler:
105
+ - type: AMDeepSeekDataHandler # Custom data handler class
106
  ...
107
  tokenizer:
108
  auto_register: qwen2_5_tokenizer.Qwen2Tokenizer
109
  ...
110
+ vocab_file: "/path/to/vocab.json" # Qwen2.5 tokenizer vocabulary
111
+ merges_file: "/path/to/merges.txt" # Qwen2.5 merge rules
112
  ...
113
  ```
114
+
115
+ *(Note: This is a minimal example showing frequently modified fields. Full configuration is available in the code repository.)*
116
+
117
+ Execute the conversion script to generate ​​Arrow-format data​​:
118
+
119
  ```
120
  export PYTHONPATH=/path/to/mindformers/:PYTHONPATH
121
+ python th/to/mindformers/toolkit/data_preprocess/huggingface/datasets_preprocess.py
122
+ --config ./pcl_reasoner_v1/config/data_process_handling.yaml
123
+ --save_path /path/to/handled_data/
124
  --register_path ./pcl_reasoner_v1/
125
+
126
+ # Parameter Description
127
+ config: Path to format conversion config file
128
+ save_path: Output directory for processed data
129
+ register_path: Path to custom handler registration
130
  ```
131
+ * Data Packing​​:
132
+
133
+ Configure pcl_reasoner_v1/config/data_process_packing.yaml to specify input paths for packed data generation:
134
+
135
  ```
136
  # dataset
137
  train_dataset:
138
  data_loader:
139
  ...
140
+ path: /path/to/handled_data # Processed dataset
141
  ...
142
  ```
143
+
144
+ *(Note: Example shows key fields only. Refer to repository for full config.)*
145
+
146
+ Run the packing script to generate ​​sequence-packed data​:
147
+
148
  ```
149
  export PYTHONPATH=/path/to/mindformers/:PYTHONPATH
150
+ python /path/to/mindformers/toolkit/data_preprocess/huggingface/datasets_preprocess.py
151
+ --config ./pcl_reasoner_v1_config/data_process_packing.yaml
152
+ --save_path /path/to/packed_data/
153
  --register_path ./pcl_reaoner_v1/
154
+
155
+ # Parameter Description
156
+ config: Path to data packing config file
157
+ save_path: Output directory for packed data
158
+ register_path: Path to custom handler registration
159
  ```
160
 
 
 
 
161
 
162
+ ### 3 Training Process
163
+ #### 3.1 Weight Preparation
164
+
165
+ Users can download pre-trained weights from HuggingFace:
166
 
167
+ | Model Name | Weights URL |
168
  | ------------------- | --------------------------------------------------------------------------------- |
169
  | Qwen2.5-32B-Base | [https://huggingface.co/Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) |
170
 
171
+ ​**Note**: MindFormers v1.5.0+ supports direct loading/saving of `safetensors` format weights. No conversion to `ckpt` is required. Subsequent fine-tuning will use `safetensors` format.
 
 
 
 
 
 
 
 
 
 
 
 
 
172
 
173
+ #### 3.2 Training Configuration
174
+ *(Only frequently modified configurations are shown. Full config: `pcl_reasoner_v1/config/finetune_pcl_reasoner_v1_32k.yaml`)*
175
 
176
+ ​***Basic Configuration:​**​*
177
+ ```yaml
178
+ run_mode: 'finetune' # Training mode: fine-tuning
179
+ load_checkpoint: '/path/to/Qwen-32B-base/' # Weight file path
180
+ load_ckpt_format: 'safetensors' # Weight format
181
+ auto_trans_ckpt: True # Enable online weight splitting for distributed training
182
  ```
183
+ ***Dataset Configuration:​***
184
+ ```yaml
185
  train_dataset: &train_dataset
186
+ data_loader:
187
+ type: CommonDataLoader
188
+ path: "/path/to/dataset/pack_data_lt_32K_full" # Packed dataset path
189
+ load_func: 'load_from_disk' # Data loading method
190
+ shuffle: True # Enable data shuffling
191
+ packing: pack # Packed data format
192
+ adaptor_config:
193
+ compress_mask: True
194
+ mock_config:
195
+ seq_length: 32768 # Packed sequence length (32K tokens)
196
+ size: 25909 # Dataset size / data parallelism split
 
 
 
 
 
 
 
 
197
  ```
198
+ ***Parallelism Configuration:​***
199
+ ```yaml
200
  parallel_config:
201
+ data_parallel: &dp 8 # Data parallelism
202
+ model_parallel: 8 # Model parallelism
203
+ pipeline_stage: 2 # Pipeline parallelism stages
204
+ use_seq_parallel: True # Enable sequence parallelism
205
+ optimizer_shard: True # Enable optimizer sharding
206
+ micro_batch_num: 16 # Micro-batch size
207
  ```
208
+ > *(Note: This configuration example only lists frequently modified items. Refer to the code repository for complete configurations.)*
209
 
210
+ #### 3.3 Launching Fine-tuning
 
 
211
 
212
+ Specify the configuration file `pcl_reasoner_v1/config/finetune_pcl_reasoner_v1_32k.yaml` in the launch script `run_pcl_reasoner_v1_finetune.sh`, and modify cluster parameters according to your hardware environment:
213
 
214
+ ```bash
215
  noderank=$1
216
 
217
  bash /path/to/mindformers/scripts/msrun_launcher.sh "run_mindformer.py \
 
227
  --cluster_time_out 1200 \
228
  > run.log 2>&1
229
 
230
+ # Parameter Description
231
+ config: Path to configuration file
232
+ run_mode: Operation mode (pretrain/finetune/inference)
233
+ worker_num: Total number of accelerator cards
234
+ local_worker_num: Cards per single server
235
+ master_addr: Master node address
236
+ master_port: Master node port
237
+ log_dir: Log directory path
238
+ join: Whether to wait for all workers to exit
239
+ cluster_time_out: Cluster timeout duration
240
  ```
241
+ Then, launch the fine-tuning task using:
 
 
242
  ```
243
  bash run_pcl_reasoner_v1_finetune.sh 0
244
  ```
245
+ > *(Note: When launching on multiple nodes, specify node_rank (e.g., 0 for the first node).)*
246
 
247
+ After starting the task, monitor the runtime logs with:
 
248
  ```
249
+ tail -f /path/to/log/worker_127.log
250
  ```
 
251
 
252
+ ### 4. Evaluation
 
 
253
 
254
+ To ensure the fairness of evaluation results, we adopted the ​**open-source evaluation code from QwQ**​ ([QwQ/eval at main · QwenLM/QwQ](https://github.com/QwenLM/QwQ)). Developers can follow the `README.md` in the code repository to set up the environment and evaluate models.
255
 
256
+ #### Evaluation Hyperparameters
257
+ The sampling hyperparameters used are listed below:
 
 
 
 
 
258
 
259
+ | Hyperparameter | Value |
260
+ |----------------|---------------------------------|
261
+ | `temperature` | 0.6 |
262
+ | `top_k` | 40 |
263
+ | `top_p` | 0.95 |
264
+ | `max_tokens` | 129,024 |
265
+ | `chat_template`| `./pcl_reasoner_v1/eval/am_thinking.jinja` |
266
 
267
+ #### Evaluation Results on AIME24/25
268
+ The table below compares mainstream models on the AIME24 and AIME25 benchmarks. For accuracy, we used the ​**Avg@32 metric**​ (averaging 32 sampling attempts per query):
269
 
 
270
 
 
 
 
 
 
 
271
 
272
  <table>
273
  <tr>
274
+ <th>Parameter Size</th>
275
+ <th>Model Name</th>
276
  <th>AIME 24</th>
277
  <th>AIME 25</th>
278
  </tr>
 
306
  <td><span style="color:red">90.8</span></td>
307
  <td><span style="color:grey">83</span></td>
308
  </tr>
 
 
 
 
309
  <!-- 合并行表头 32B -->
310
  <tr>
311
  <th rowspan="7">32B</th>
 
343
  </tr>
344
  </table>
345
 
346
+ > *Note: Generated results for AIME24/25 are available in the [`pcl_reasoner_v1/eval/eval_res`](https://openi.pcl.ac.cn/PCL-Reasoner/V1) directory for developer verification and comparison.*
 
347
 
348
+ #### Impact of Answer Length on Accuracy
349
+ We analyzed the relationship between maximum answer length (`max_tokens`) and model accuracy. Due to results listed below, we find that on AIME24 which is relatively simpler, decode length of 64K​ are sufficient to achieve peak accuracy of 85.7%. In contrast, AIME25 which is relatively harder requires ​128K tokens​ to reach optimal performance (84.2%):
 
 
 
 
350
 
351
  <table>
352
  <tr>
353
+ <th>Decode Length</th>
354
  <th>16K</th>
355
  <th>32K</th>
356
  <th>64K</th>
 
371
  <td>84.2</td>
372
  </tr>
373
  </table>
 
README_CN.md ADDED
@@ -0,0 +1,403 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # **PCL-Reasoner-V1模型**
2
+
3
+ ## 模型概览
4
+
5
+ 本次发布的PCL-Reasoner-V1模型,以Qwen2.5-32B-Base为起点,基于昇思框架与昇腾硬件进行了高性能的监督微调。经过微调,模型在数学推理能力上取得了显著提升:其在权威基准评测集AIME24上准确率达 85.7%,AIME25上达 84.2%,在 32B参数级别模型中稳居前列。
6
+
7
+ 为促进技术共享与应用,我们已完整开源了PCL-Reasoner-V1的模型权重、微调数据及训练代码。该模型不仅是当下领先的32B数学推理模型之一,更为开发者提供了宝贵的专业领域监督微调实践经验与后训练解决方案。用户可参照以下教程轻松部署体验,深入探索后训练的实践方法与奥秘!
8
+
9
+ ![eval_results](images/README/eval_results.png)
10
+
11
+ ## 开发指导
12
+
13
+ ### 1. 模型文件
14
+
15
+ PCL-Reasoner-V1基于Qwen-2.5-Base进行微调后训练,训练流程基于MindFormers实现,主要涉及的文件有:
16
+
17
+ 数据处理:
18
+
19
+ ```
20
+ pcl_reasoner_v1
21
+ ├── qwen2_5_tokenizer.py # qwen2_5 tokenizer
22
+ ├── packing_handler.py # 数据packing处理
23
+ └── data_preprocess
24
+ ├── decontaminate.py # 数据污染检测
25
+ └── dataset_prehandle_and_split.py # 数据拆分及预处理
26
+ ```
27
+
28
+ 模型配置:
29
+
30
+ ```
31
+ pcl_reasoner_v1/config
32
+ ├── data_process_handling.yaml # 数据格式转换配置文件
33
+ ├── data_process_packing.yaml # 数据拼接配置文件
34
+ └── finetune_pcl_reasoner_v1_32k.yaml # 模型微调配置文件
35
+ ```
36
+
37
+ 任务启动脚本:
38
+
39
+ ```
40
+ pcl_reasoner_v1
41
+ └── run_pcl_reasoner_v1_finetune.sh # 模型微调启动脚本
42
+ ```
43
+
44
+ ### 2.环境及数据准备
45
+
46
+ #### 2.1 安装环境:
47
+
48
+ | 软件| 版本 |
49
+ | --- | --- |
50
+ | 固件&驱动| 24.1.rc3.5 |
51
+ | CANN| 7.7.T9.0.B057:8.1.RC1 |
52
+ | Python | 3.10 |
53
+ | MindSpore | 2.6.0 |
54
+ | MindSpore TransFormers | r1.5.0 |
55
+
56
+ #### 2.2 数据处理
57
+
58
+ ##### 2.2.1 数据集下载
59
+
60
+ 用户可以从HuggingFace官方下载原始数据集:
61
+
62
+ | 数据集名称 | 数据集链接 |
63
+ | ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
64
+ | AM-DeepSeek-R1-0528-Distilled | [https://huggingface.co/a-m-team/AM-DeepSeek-R1-0528-Distilled](https://huggingface.co/a-m-team/AM-DeepSeek-R1-0528-Distilled) |
65
+
66
+ ##### 2.2.2 数据预处理
67
+
68
+ 首先,我们对源数据进行检测和筛选,操作分为两个步骤,验证集污染检测与数据筛选。
69
+
70
+ * 验证集污染检测:我们采用基于all-MiniLM-L6-v2模型计算文本余弦相似度的方法,对数学部分原始数据针对AIME24/25评测集进行污染检测。该脚本执行后会在终端打印检测结果,并在指定的输出路径中保存相似度大于阈值的题目及其匹配的评测集题目。
71
+
72
+ ```
73
+ python PCL-Reasoner-V1/pcl_reasoner_v1/data_preprocess/decontaminate.py \
74
+ --target_data /path/to/target_data \
75
+ --contaminant_source PCL-Reasoner-V1/pcl_reasoner_v1/data_preprocess/aime2425_questions.json \
76
+ --model_path /path/to/distilled/model_path \
77
+ --output_file_prefix /path/to/output_file_prefix
78
+ --threshold 0.7
79
+
80
+ # 参数说明
81
+ target_data:需要被检测的数据
82
+ contaminant_source:污染源,即评测集数据
83
+ model_path:计算文本嵌入的模型
84
+ output_file_prefix:检测结果输出的路径
85
+ threshold:相似度阈值
86
+ ```
87
+ * 数据筛选及处理:运行数据处理脚本,进行数据长度筛选,选取问题加思维链长度小于32K tokens的数据,并将提示词添加到数据中。
88
+
89
+ ```
90
+ python PCL-Reasoner-V1/pcl_reasoner_v1/data_preprocess/convert_and_split_dataset.py \
91
+ --json_file_paths /path/to/AM-DeepSeek-R1-0528-Distilled/math.jsonl
92
+
93
+ # 参数说明
94
+ json_file_paths:需要处理的数据集,支持传入多个路径,用空格分隔
95
+ ```
96
+
97
+ 其次,我们将数据转换成packing格式,操作分为两个步骤,格式转换与数据拼接。
98
+
99
+ * 格式转换:在配置文件`pcl_reasoner_v1/config/data_process_handling.yaml`中指定`data_files`、`vocab_file`、`merges_file`等文件路径,指定`pcl_reasoner_ v1/packing _handler.py`文件中自定义的`AMDeepSeekDataHandler`为数据handler:
100
+
101
+ ```
102
+ train_dataset:
103
+ ...
104
+ path: "json" # 原始数据集文件格式
105
+ data_files:
106
+ ["/path/to/data.jsonl"] # 原始数据集路径
107
+ input_columns: *input_columns
108
+ handler:
109
+ - type: AMDeepSeekDataHandler # 指定自定义的数据处理类
110
+ ...
111
+ tokenizer:
112
+ auto_register: qwen2_5_tokenizer.Qwen2Tokenizer
113
+ ...
114
+ vocab_file: "/path/to/vocab.json" # Qwen2_5默认tokenizer文件
115
+ merges_file: "/path/to/merges.txt" # Qwen2_5默认tokenizer文件
116
+ ...
117
+ ```
118
+
119
+ *(注意事项:以上模型配置为示例,仅列出用户高频修改的配置项,完整配置文件见代码仓)*
120
+
121
+ 运行数据处理脚本,生成Arrow格式数据文件:
122
+
123
+ ```
124
+ export PYTHONPATH=/path/to/mindformers/:PYTHONPATH
125
+ python th/to/mindformers/toolkit/data_preprocess/huggingface/datasets_preprocess.py
126
+ --config ./pcl_reasoner_v1/config/data_process_handling.yaml
127
+ --save_path /path/to/handled_data/
128
+ --register_path ./pcl_reasoner_v1/
129
+
130
+ # 参数说明
131
+ config:数据格式转换的配置文件路径
132
+ save_path:转换后数据集的保存文件夹路径
133
+ register_path:自定义数据Handler注册目录路径
134
+ ```
135
+ * 数据拼接:
136
+
137
+ 在配置文件pcl_reasoner_v1/config/data_process_packing.yaml指定packing后数据的存储路径:
138
+
139
+ ```
140
+ # dataset
141
+ train_dataset:
142
+ data_loader:
143
+ ...
144
+ path: /path/to/handled_data #预处理后数据集的路径
145
+ ...
146
+ ```
147
+
148
+ *(注意事项:以上模型配置为示例,仅列出用户高频修改的配置项,完整配置文件见代码仓)*
149
+
150
+ 运行数据packing脚本,生成packing后数据文件:
151
+
152
+ ```
153
+ export PYTHONPATH=/path/to/mindformers/:PYTHONPATH
154
+ python /path/to/mindformers/toolkit/data_preprocess/huggingface/datasets_preprocess.py
155
+ --config ./pcl_reasoner_v1_config/data_process_packing.yaml
156
+ --save_path /path/to/packed_data/
157
+ --register_path ./pcl_reaoner_v1/
158
+
159
+ # 参数说明
160
+ config:数据拼接的配置文件路径
161
+ save_path:拼接后数据集的保存文件夹路径
162
+ register_path:自定义数据Handler注册目录路径
163
+ ```
164
+ ### 3 训练流程
165
+ #### 3.1 权重准备
166
+
167
+ 用户可以从HuggingFace官方下载预训练权重
168
+
169
+ | 模型名称 | 权重链接 |
170
+ | ------------------- | --------------------------------------------------------------------------------- |
171
+ | qwen2\_5-32b-Base | [https://huggingface.co/Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) |
172
+
173
+ MindFormers 1.5.0及以上版本已支持safetensor格式的权重直接加载及保存,无需转换成ckpt,下文中微调将使用safetensors格式权重运行。
174
+
175
+ #### 3.2 训练配置:
176
+
177
+ 下面仅列出用户高频修改的配置项,完整配置文件见`pcl_reasoner_v1/config/ finetune_pcl_reasoner_v1_32k.yaml`
178
+
179
+ 基本配置:
180
+
181
+ ```
182
+ run_mode: 'finetune' # 设置训练模式为“finetune”
183
+ load_checkpoint: '/path/to/Qwen-32B-base/' # 权重文件路径
184
+ load_ckpt_format: 'safetensors' # 设置权重格式为“safetensors”
185
+ auto_trans_ckpt: True # 设置在线权重切分至分布式权重
186
+ ```
187
+
188
+ 数据集配置:
189
+
190
+ ```
191
+ train_dataset: &train_dataset
192
+  
193
+   data_loader:
194
+     type: CommonDataLoader
195
+  
196
+     # offline
197
+     path: "/path/to/dataset/pack_data_lt_32K_full" # 数据文件路径
198
+     load_func: 'load_from_disk' # 设置数据加载方式为“load_from_disk”
199
+    
200
+     shuffle: True # 数据打乱功能使能
201
+     packing: pack # 数据格式为pack
202
+     adaptor_config:
203
+       compress_mask: True
204
+     mock_config:
205
+       seq_length: 32768 # 数据pack后长度为32k
206
+       size: 25909 # 数据集大小/数据并行切分
207
+ ```
208
+
209
+ 并行配置:
210
+
211
+ ```
212
+ parallel_config:
213
+   data_parallel: &dp 8 # 数据并行切分为8
214
+   model_parallel: 8 # 模型并行切分为8
215
+   pipeline_stage: 2 # 流水线并行切分为2
216
+   use_seq_parallel: True # 序列并行使能
217
+   optimizer_shard: True  # 优化器并行使能
218
+   micro_batch_num: 16 # micro bathsize设置为16
219
+ ```
220
+
221
+ > *(注意事项:以上模型配置为示例,仅列出用户高频修改的配置项,完整配置文件见代码仓)*
222
+
223
+ #### 3.3 启动微调
224
+
225
+ 在启动脚本`run_pcl_reasoner_v1_finetune.sh`指定配置文件`pcl_reasoner _v1/config/finetune_pcl_reasoner_v1_32k.yaml`,并根据用户的实际情况对卡数、服务器IP等配置进行修改:
226
+
227
+ ```
228
+ noderank=$1
229
+
230
+ bash /path/to/mindformers/scripts/msrun_launcher.sh "run_mindformer.py \
231
+ --config /path/to/finetune_pcl_reasoner_v1_32k.yaml \
232
+ --run_mode finetune" \
233
+ --worker_num 128 \
234
+ --local_worker_num 8 \
235
+ --master_addr XX.XX.XX.XX \
236
+ --master_port XXXX \
237
+ --node_rank $noderank \
238
+ --log_dir /path/to/log \
239
+ --join False \
240
+ --cluster_time_out 1200 \
241
+ > run.log 2>&1
242
+
243
+ # 参数说明
244
+ config:配置文件路径
245
+ run_mode:运行模式(预训练/微调/推理)
246
+ worker_num: 总卡数
247
+ local_worker_num: 单机的卡数
248
+ master_addr:主节点地址
249
+ master_port: 主节点端口
250
+ log_dir: 日志路径
251
+ join:是否等待所有worker退出
252
+ cluster_time_out:集群等待时间
253
+ ```
254
+
255
+ 然后,使用`bash run_pcl_reasoner_v1_finetune.sh`指令启动微调训练,在多个节点上启动时,需指定`node_rank`(以下指令以0节点为示例):
256
+
257
+ ```
258
+ bash run_pcl_reasoner_v1_finetune.sh 0
259
+ ```
260
+
261
+ 在拉起任务后,通过以下指令查看运行日志:
262
+
263
+ ```
264
+ tail -f path/to/log/worker_127.log
265
+ ```
266
+
267
+ ### 4. 评测流程:
268
+
269
+ 为了保障评测结果的公平性,我们采用了QwQ开源的评测代码(QwQ/eval at main · QwenLM/QwQ),可以根据代码仓中README.md指导进行环境安装及模型评测。
270
+ 我们采用的评测超参如下所示:
271
+
272
+ | 采样超参 | 取值 |
273
+ | ---------------- | --------------------------------------------- |
274
+ | temperature | 0.6 |
275
+ | top\_k | 40 |
276
+ | top\_p | 0.95 |
277
+ | max\_tokens | 129024 |
278
+ | chat\_template |`./pcl_reasoner_v1/eval/am_thinking.jinja` |
279
+
280
+ 我们在AIME24/AIME25评测结果详见下表数据。为确保评估准确性,我们采用Avg@32指标(平均32次采样)进行了评测:
281
+
282
+
283
+ <!-- 表格基础样式(可选添加) -->
284
+
285
+ <style>
286
+ table { border-collapse: collapse; width: 100%; margin-left: auto;margin-right: auto;}
287
+ th, td { border: 1px solid #ddd; padding: 8px; text-align: center; }
288
+ </style>
289
+
290
+ <!-- 表格主体 -->
291
+
292
+ <table>
293
+ <tr>
294
+ <th>模型规格</th>
295
+ <th>模型</th>
296
+ <th>AIME 24</th>
297
+ <th>AIME 25</th>
298
+ </tr>
299
+ <!-- 合并行表头 >100B -->
300
+ <tr>
301
+ <th rowspan="6">&gt;100B</th>
302
+ </tr>
303
+ <!-- >100B 组数据行 -->
304
+ <tr>
305
+ <td>DeepSeek-R1</td>
306
+ <td><span style="color:grey">79.8</span></td>
307
+ <td><span style="color:grey">70</span></td>
308
+ </tr>
309
+ <tr>
310
+ <td>DeepSeek-R1-0528</td>
311
+ <td><span style="color:red">91.4</span></td>
312
+ <td><span style="color:red">87.5</span></td>
313
+ </tr>
314
+ <tr>
315
+ <td>Qwen3-235B-A22B</td>
316
+ <td><span style="color:grey">85.7</span></td>
317
+ <td><span style="color:grey">81.5</span></td>
318
+ </tr>
319
+ <tr>
320
+ <td>OpenAI-o3</td>
321
+ <td><span style="color:red">91.6</span></td>
322
+ <td><span style="color:red">88.9</span></td>
323
+ </tr>
324
+ <tr>
325
+ <td>Gemini-2.5-Pro-0506</td>
326
+ <td><span style="color:red">90.8</span></td>
327
+ <td><span style="color:grey">83</span></td>
328
+ </tr>
329
+ <!-- 分隔行 -->
330
+ <tr>
331
+ <td colspan="4"></td>
332
+ </tr>
333
+ <!-- 合并行表头 32B -->
334
+ <tr>
335
+ <th rowspan="7">32B</th>
336
+ </tr>
337
+ <!-- 32B 组数据行 -->
338
+ <tr>
339
+ <td>Qwen3-32B</td>
340
+ <td><span style="color:grey">81.4</span></td>
341
+ <td><span style="color:grey">72.9</span></td>
342
+ </tr>
343
+ <tr>
344
+ <td>QwQ-32B</td>
345
+ <td><span style="color:grey">79.5</span></td>
346
+ <td><span style="color:grey">69.5</span></td>
347
+ </tr>
348
+ <tr>
349
+ <td>DeepSeek-R1-Distill-Qwen-32B</td>
350
+ <td><span style="color:grey">72.6</span></td>
351
+ <td><span style="color:grey">49.6</span></td>
352
+ </tr>
353
+ <tr>
354
+ <td>Skywork-OR1-32B</td>
355
+ <td><span style="color:grey">82.2</span></td>
356
+ <td><span style="color:grey">73.3</span></td>
357
+ </tr>
358
+ <tr>
359
+ <td>AM-Thinking-v1</td>
360
+ <td><span style="color:grey">85.3</span></td>
361
+ <td><span style="color:grey">74.4</span></td>
362
+ </tr>
363
+ <tr>
364
+ <td>PCL-Reasoner-v1</td>
365
+ <td><p style="font-weight: bold;">85.7</p></td>
366
+ <td><p style="font-weight: bold;">84.2</p></td>
367
+ </tr>
368
+ </table>
369
+
370
+ > *(注:模型在AIME24/25评测集上的生成结果文件已同步上传至 `pcl_reasoner_v1/eval/eval_res`目录,供开发者用于模型验证与效果比对参考)*
371
+
372
+
373
+ 另外,我们也针对评测时不同模型回答长度统计正确率,可以看出AIME24/25评测集对回答长度要求较高,而且较为简单的AIME24上,64K tokens的回答长度可以满足,而较为难的AIME25上则需要回答长度长达128K tokens:
374
+
375
+ <style>
376
+ table { border-collapse: collapse; width: 100%; margin-left: auto;margin-right: auto;}
377
+ th, td { border: 1px solid #ddd; padding: 8px; text-align: center; }
378
+ </style>
379
+
380
+ <table>
381
+ <tr>
382
+ <th>回答长度</th>
383
+ <th>16k</th>
384
+ <th>32k</th>
385
+ <th>64k</th>
386
+ <th>128k</th>
387
+ </tr>
388
+ <tr>
389
+ <td>AIME24</td>
390
+ <td>42.0</td>
391
+ <td>77.9</td>
392
+ <td>85.7</td>
393
+ <td>85.7</td>
394
+ </tr>
395
+ <tr>
396
+ <td>AIME25</td>
397
+ <td>33.4</td>
398
+ <td>75.6</td>
399
+ <td>83.9</td>
400
+ <td>84.2</td>
401
+ </tr>
402
+ </table>
403
+