PCL-Reasoner
/

V1

Safetensors

Model card Files Files and versions

xet

Community

PCL-Reasoner commited on Jul 23

Commit

39d1d63

verified ·

1 Parent(s): 2d276c8

Upload 2 files

Browse files

Files changed (2) hide show

README.md +164 -232
README_CN.md +403 -0

README.md CHANGED Viewed

@@ -1,108 +1,70 @@
-<style>
-  /* 全局字体与间距 */
-  body {
-    font-family: 'Segoe UI', sans-serif;
-    line-height: 1.75;           /* 行间距1.75倍 */
-    color: #333
-    margin: 0 auto;              /* 内容居中 */
-  }
-  /* 段间距 */
-  p, ul, ol, blockquote {
-    margin-bottom: 1.25em;        /* 段落/列表/引用间距 */
-  }
-  /* 代码块 */
-  pre {
-    background-color:rgb(146, 150, 153);
-    border-radius: 6px;
-    padding: 16px;
-    overflow: auto;
-    line-height: 1.45;           /* 代码行距 */
-  }
-  /* 行内代码 */
-  code {
-    background: rgb(146, 150, 153);
-    padding: 2px 6px;
-    border-radius: 3px;
-    font-family: 'Fira Code', monospace;
-  }
-</style>
-# **PCL-Reasoner-V1模型**
-## 模型概览
-本次发布的PCL-Reasoner-V1模型，以Qwen2.5-32B-Base为起点，基于昇思框架与昇腾硬件进行了高性能的监督微调。经过微调，模型在数学推理能力上取得了显著提升：其在权威基准评测集AIME24上准确率达85.7%，AIME25上达84.2%，在32B参数级别模型中稳居前列。
-为促进技术共享与应用，我们已完整开源了PCL-Reasoner-V1的模型权重、微调数据及训练代码。该模型不仅是当下领先的32B数学推理模型之一，更为开发者提供了宝贵的专业领域监督微调实践经验与后训练解决方案。用户可参照以下教程轻松部署体验，深入探索后训练的实践方法与奥秘！
-![eval_results](images/README/eval_results.png)
-## 开发指导
-### 1. 模型文件
-PCL-Reasoner-V1基于Qwen2.5-32B-Base进行微调后训练，训练流程基于MindFormers实现，主要涉及的文件有：
-数据处理：
 ```
 pcl_reasoner_v1
   ├── qwen2_5_tokenizer.py        # qwen2_5 tokenizer
-  ├── packing_handler.py          # 数据packing处理
   └── data_preprocess
-  	├── decontaminate.py          # 数据污染检测
-  	└── dataset_prehandle_and_split.py # 数据拆分及预处理
 ```
-模型配置：
 ```
 pcl_reasoner_v1/config
-  ├── data_process_handling.yaml           # 数据格式转换配置文件
-  ├── data_process_packing.yaml            # 数据拼接配置文件
-  └── finetune_pcl_reasoner_v1_32k.yaml  # 模型微调配置文件
 ```
-任务启动脚本：
 ```
 pcl_reasoner_v1
-  └── run_pcl_reasoner_v1_finetune.sh  # 模型微调启动脚本
 ```
-### 2.环境及数据准备
-#### 2.1 安装环境
-| 软件| 版本 |
-| --- | --- |
-| 固件&驱动| 24.1.rc3.5 |
-| CANN| 7.7.T9.0.B057:8.1.RC1 |
-| Python | 3.10 |
-| MindSpore | 2.6.0  |
 | MindSpore TransFormers | r1.5.0 |
-#### 2.2 数据处理
-##### 2.2.1 数据集下载
-用户可以从HuggingFace官方下载原始数据集：
-| 数据集名称                    | 数据集链接                                                                                                                  |
 | ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
 | AM-DeepSeek-R1-0528-Distilled | [https://huggingface.co/a-m-team/AM-DeepSeek-R1-0528-Distilled](https://huggingface.co/a-m-team/AM-DeepSeek-R1-0528-Distilled) |
-##### 2.2.2 数据预处理
-首先，我们对源数据进行检测和筛选，操作分为两个步骤，验证集污染检测与数据筛选。
-* 验证集污染检测：我们采用基于all-MiniLM-L6-v2模型计算文本余弦相似度的方法，对数学部分原始数据针对AIME24/25评测集进行污染检测。该脚本执行后会在终端打印检测结果，并在指定的输出路径中保存相似度大于阈值的题目及其匹配的评测集题目。
   ```
   python PCL-Reasoner-V1/pcl_reasoner_v1/data_preprocess/decontaminate.py \
   --target_data /path/to/target_data \
@@ -110,157 +72,146 @@ pcl_reasoner_v1
   --model_path /path/to/distilled/model_path \
   --output_file_prefix /path/to/output_file_prefix
   --threshold 0.7
-  # 参数说明
-  target_data：需要被检测的数据
-  contaminant_source：污染源，即评测集数据
-  model_path：计算文本嵌入的模型
-  output_file_prefix：检测结果输出的路径
-  threshold：相似度阈值
   ```
-* 数据筛选及处理：运行数据处理脚本，进行数据长度筛选，选取问题加思维链长度小于32K tokens的数据，并将提示词添加到数据中。
   ```
   python PCL-Reasoner-V1/pcl_reasoner_v1/data_preprocess/convert_and_split_dataset.py \
   --json_file_paths /path/to/AM-DeepSeek-R1-0528-Distilled/math.jsonl
-  # 参数说明
-  json_file_paths：需要处理的数据集，支持传入多个路径，用空格分隔
   ```
-其次，我们将数据转换成packing格式，操作分为两个步骤，格式转换与数据拼接。
-* 格式转换：在配置文件`pcl_reasoner_v1/config/data_process_handling.yaml`中指定`data_files`、`vocab_file`、`merges_file`等文件路径，指定`pcl_reasoner_v1/packing_handler.py`文件中自定义的`AMDeepSeekDataHandler`为数据handler：
   ```
   train_dataset:
       ...
-      path: "json" # 原始数据集文件格式
       data_files:
-          ["/path/to/data.jsonl"] # 原始数据集路径
       input_columns: *input_columns
       handler:
-        - type: AMDeepSeekDataHandler # 指定自定义的数据处理类
           ...
           tokenizer:
             auto_register: qwen2_5_tokenizer.Qwen2Tokenizer
             ...
-            vocab_file: "/path/to/vocab.json" # Qwen2_5默认tokenizer文件
-            merges_file: "/path/to/merges.txt" # Qwen2_5默认tokenizer文件
             ...
   ```
-  *（注意事项：以上模型配置为示例，仅列出用户高频修改的配置项，完整配置文件见代码仓）*
-  运���数据处理脚本，生成Arrow格式数据文件：
   ```
   export PYTHONPATH=/path/to/mindformers/:PYTHONPATH
-  python th/to/mindformers/toolkit/data_preprocess/huggingface/datasets_preprocess.py
-      --config ./pcl_reasoner_v1/config/data_process_handling.yaml
-      --save_path /path/to/handled_data/
       --register_path ./pcl_reasoner_v1/
-  # 参数说明
-  config：数据格式转换的配置文件路径
-  save_path：转换后数据集的保存文件夹路径
-  register_path：自定义数据Handler注册目录路径
   ```
-* 数据拼接：
-  在配置文件pcl_reasoner_v1/config/data_process_packing.yaml指定packing后数据的存储路径：
   ```
   # dataset
   train_dataset:
     data_loader:
     ...
-    path: /path/to/handled_data #预处理后数据集的路径
     ...
   ```
-  *（注意事项：以上模型配置为示例，仅列出用户高频修改的配置项，完整配置文件见代码仓）*
-  运行数据packing脚本，生成packing后数据文件：
   ```
   export PYTHONPATH=/path/to/mindformers/:PYTHONPATH
-  python /path/to/mindformers/toolkit/data_preprocess/huggingface/datasets_preprocess.py
-      --config ./pcl_reasoner_v1_config/data_process_packing.yaml
-      --save_path /path/to/packed_data/
       --register_path ./pcl_reaoner_v1/
-  # 参数说明
-  config：数据拼接的配置文件路径
-  save_path：拼接后数据集的保存文件夹路径
-  register_path：自定义数据Handler注册目录路径
   ```
-### 3 训练流程
-#### 3.1 权重准备
-用户可以从HuggingFace官方下载预训练权重:
-| 模型名称          | 权重链接                                                                        |
 | ------------------- | --------------------------------------------------------------------------------- |
 | Qwen2.5-32B-Base | [https://huggingface.co/Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) |
-MindFormers 1.5.0及以上版本已支持safetensors格式的权重直接加载及保存，无需转换成ckpt，下文中微调将使用safetensors格式权重运行。
-#### 3.2 训练配置
-下面仅列出用户高频修改的配置项，完整配置文件见`pcl_reasoner_v1/config/finetune_pcl_reasoner_v1_32k.yaml`
-基本配置：
-```
-run_mode: 'finetune' # 设置训练模式为“finetune”
-load_checkpoint: '/path/to/Qwen-32B-base/' # 权重文件路径
-load_ckpt_format: 'safetensors' # 设置权重格式为“safetensors”
-auto_trans_ckpt: True  # 设置在线权重切分至分布式权重
-```
-数据集配置：
 ```
 train_dataset: &train_dataset
-  data_loader:
-    type: CommonDataLoader
-    # offline
-    path: "/path/to/dataset/pack_data_lt_32K_full" # 数据文件路径
-    load_func: 'load_from_disk' # 设置数据���载方式为“load_from_disk”
-    shuffle: True # 数据打乱功能使能
-    packing: pack # 数据格式为pack
-    adaptor_config:
-      compress_mask: True
-    mock_config:
-      seq_length: 32768 # 数据pack后长度为32k
-      size: 25909 # 数据集大小/数据并行切分
-```
-并行配置：
 ```
 parallel_config:
-  data_parallel: &dp 8 # 数据并行切分为8
-  model_parallel: 8 # 模型并行切分为8
-  pipeline_stage: 2 # 流水线并行切分为2
-  use_seq_parallel: True # 序列并行使能
-  optimizer_shard: True  # 优化器并行使能
-  micro_batch_num: 16 # micro bathsize设置为16
 ```
-> *（注意事项：以上模型配置为示例，仅列出用户高频修改的配置项，完整配置文件见代码仓）*
-#### 3.3 启动微调
-在启动脚本`run_pcl_reasoner_v1_finetune.sh`指定配置文件`pcl_reasoner_v1/config/finetune_pcl_reasoner_v1_32k.yaml`，并根据用户的实际情况对卡数、服务器IP等配置进行修改：
-```
 noderank=$1
 bash /path/to/mindformers/scripts/msrun_launcher.sh "run_mindformer.py \
@@ -276,61 +227,52 @@ bash /path/to/mindformers/scripts/msrun_launcher.sh "run_mindformer.py \
 --cluster_time_out 1200 \
 > run.log 2>&1
-# 参数说明
-config：配置文件路径
-run_mode：运行模式（预训练/微调/推理）
-worker_num： 总卡数
-local_worker_num： 单机的卡数
-master_addr：主节点地址
-master_port: 主节点端口
-log_dir: 日志路径
-join：是否等待所有worker退出
-cluster_time_out：集群等待时间
 ```
-然后，使用`bash run_pcl_reasoner_v1_finetune.sh`指令启动微调训练，在多个节点上启动时，需指定`node_rank`（以下指令以0节点为示例）：
 ```
 bash run_pcl_reasoner_v1_finetune.sh 0
 ```
-在拉起任务后，通过以下指令查看运行日志：
 ```
-tail -f path/to/log/worker_127.log
 ```
-### 4. 评测流程
-为了保障评测结果的公平性，我们采用了QwQ开源的评测代码（QwQ/eval at main · QwenLM/QwQ），可以根据代码仓中README.md指导进行环境安装及模型评测。
-我们采用的评测超参如下所示：
-| 采样超参       | 取值                                        |
-| ---------------- | --------------------------------------------- |
-| temperature    | 0.6                                         |
-| top\_k         | 40                                          |
-| top\_p         | 0.95                                        |
-| max\_tokens    | 129024                                      |
-| chat\_template |`./pcl_reasoner_v1/eval/am_thinking.jinja` |
-我们在AIME24/AIME25评测结果详见下表数据。为确保评估准确性，我们��用Avg@32指标（平均32次采样）进行了评测：
-<!-- 表格基础样式（可选添加） -->
-<style>
-  table { border-collapse: collapse; width: 100%; margin-left: auto;margin-right: auto;}
-  th, td { border: 1px solid #ddd; padding: 8px; text-align: center; }
-</style>
-<!-- 表格主体 -->
 <table>
   <tr>
-    <th>参数量</th>
-    <th>模型</th>
     <th>AIME 24</th>
     <th>AIME 25</th>
   </tr>
@@ -364,10 +306,6 @@ tail -f path/to/log/worker_127.log
     <td><span style="color:red">90.8</span></td>
     <td><span style="color:grey">83</span></td>
   </tr>
-  <!-- 分隔行 -->
-  <tr>
-    <td colspan="4" style="background-color: #f8f8f8;"></td>
-  </tr>
   <!-- 合并行表头 32B -->
   <tr>
     <th rowspan="7">32B</th>
@@ -405,19 +343,14 @@ tail -f path/to/log/worker_127.log
   </tr>
 </table>
-> *(注：模型在AIME24/25评测集上的生成结果文件已同步上传至 `pcl_reasoner_v1/eval/eval_res`目录，供开发者用于模型验证与效果比对参考）*
-另外，我们也针对评测时不同模型回答长度统计正确率，可以看出AIME24/25评测集对回答长度要求较高，而且较为简单的AIME24上，64K tokens的回答长度可以满足，而较为难的AIME25上则需要回答长度长达128K tokens：
-<style>
-  table { border-collapse: collapse; width: 100%; margin-left: auto;margin-right: auto;}
-  th, td { border: 1px solid #ddd; padding: 8px; text-align: center; }
-</style>
 <table>
   <tr>
-    <th>回答长度</th>
     <th>16K</th>
     <th>32K</th>
     <th>64K</th>
@@ -438,4 +371,3 @@ tail -f path/to/log/worker_127.log
     <td>84.2</td>
   </tr>
 </table>

+# **PCL-Reasoner-V1 Model**
+## Model Overview
+We release **PCL-Reasoner-V1**， a model trained based on **Qwen2.5-32B-Base** and undergoes high-performance supervised fine-tuning based on the **MindSpore framework** and **Ascend hardware**. After fine-tuning, the model demonstrates significant improvements in mathematical reasoning capabilities. PCL-Reasoner-V1 achieves 85.7% and 84.2 respectively on AIME 24 and AIME 25, which position PCL-Reasoner-V1 among the top-tier models in the 32B parameter class.
+To promote technical collaboration and application, we have fully open-sourced model weights, dataset and training code
+PCL-Reasoner-V1 not only represents a leading 32B mathematical reasoning model but also provides developers with valuable expertise in domain-specific supervised fine-tuning and post-training solutions. Follow the tutorial below to deploy and explore advanced post-training methodologies!
+![eval_results](images/README/eval_results.png)
+## Development Guide
+### 1. Model Files
+PCL-Reasoner-V1 is fine-tuned from Qwen2.5-32B-Base using MindFormers. Key files include:
+**Data Processing:**
 ```
 pcl_reasoner_v1
   ├── qwen2_5_tokenizer.py        # qwen2_5 tokenizer
+  ├── packing_handler.py          # Data packing process
   └── data_preprocess
+  	├── decontaminate.py          # validation set contamination detection
+  	└── dataset_prehandle_and_split.py # dataset prehandle and split
 ```
+**Model Configuration:**
 ```
 pcl_reasoner_v1/config
+  ├── data_process_handling.yaml           # Format conversion configuration file
+  ├── data_process_packing.yaml            # Data packing configuration file
+  └── finetune_pcl_reasoner_v1_32k.yaml  # Model fine-tuning configuration file
 ```
+**Task Launch Script:**
 ```
 pcl_reasoner_v1
+  └── run_pcl_reasoner_v1_finetune.sh  # Model fine-tuning launch script
 ```
+### 2. Environment & Data Setup
+#### 2.1 Environment Installation
+| Software | Version |
+|----------|---------|
+| Firmware & Driver | 24.1.rc3.5 |
+| CANN | 7.7.T9.0.B057:8.1.RC1 |
+| Python | 3.10 |
+| MindSpore | 2.6.0 |
 | MindSpore TransFormers | r1.5.0 |
+#### 2.2 Data Processing
+##### 2.2.1 Dataset Download
+Users can download the original dataset from HuggingFace:
+| Dataset Name                    | Dataset Link                                                                                                                  |
 | ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
 | AM-DeepSeek-R1-0528-Distilled | [https://huggingface.co/a-m-team/AM-DeepSeek-R1-0528-Distilled](https://huggingface.co/a-m-team/AM-DeepSeek-R1-0528-Distilled) |
+##### 2.2.2 Data Preprocessing
+First, we perform detection and filtering on the source data through two steps: validation set contamination detection and data filtering.
+* Validation Set Contamination Detection: We use the all-MiniLM-L6-v2 model to calculate text cosine similarity and detect contamination in the original mathematical data against the AIME24/25 evaluation set.
+After execution, the script prints detection results in the terminal and saves questions with similarity exceeding the threshold (along with matched evaluation questions) to the specified output path.
   ```
   python PCL-Reasoner-V1/pcl_reasoner_v1/data_preprocess/decontaminate.py \
   --target_data /path/to/target_data \
   --model_path /path/to/distilled/model_path \
   --output_file_prefix /path/to/output_file_prefix
   --threshold 0.7
+  # Parameter Description
+  target_data: Data to be detected
+  contaminant_source: Contamination source (evaluation set data)
+  model_path: Model for text embedding calculation
+  output_file_prefix: Output path for results
+  threshold: Similarity threshold
   ```
+* Data Filtering & Processing: Execute the data processing script to filter data by length, selecting data where the combined length of the question and reasoning chain is <32K tokens, and add prompts to the data.
   ```
   python PCL-Reasoner-V1/pcl_reasoner_v1/data_preprocess/convert_and_split_dataset.py \
   --json_file_paths /path/to/AM-DeepSeek-R1-0528-Distilled/math.jsonl
+  # Parameter Description
+  json_file_paths: Dataset to process (multiple paths separated by spaces)
   ```
+Then, we convert data into packed format through two sequential steps: format conversion and data packing.
+* Format Conversion: Specify paths like `data_files`、`vocab_file`、`merges_file` in `pcl_reasoner_v1/config/data_process_handling.yaml`, specify the custom AMDeepSeekDataHandler from pcl_reasoner_v1/packing_handler.py as the data handler:
   ```
   train_dataset:
       ...
+      path: "json" # Original dataset format
       data_files:
+          ["/path/to/data.jsonl"] # Path to raw dataset
       input_columns: *input_columns
       handler:
+        - type: AMDeepSeekDataHandler # Custom data handler class
           ...
           tokenizer:
             auto_register: qwen2_5_tokenizer.Qwen2Tokenizer
             ...
+            vocab_file: "/path/to/vocab.json" # Qwen2.5 tokenizer vocabulary
+            merges_file: "/path/to/merges.txt" # Qwen2.5 merge rules
             ...
   ```
+  *(Note: This is a minimal example showing frequently modified fields. Full configuration is available in the code repository.)*
+  Execute the conversion script to generate Arrow-format data:
   ```
   export PYTHONPATH=/path/to/mindformers/:PYTHONPATH
+  python th/to/mindformers/toolkit/data_preprocess/huggingface/datasets_preprocess.py
+      --config ./pcl_reasoner_v1/config/data_process_handling.yaml
+      --save_path /path/to/handled_data/
       --register_path ./pcl_reasoner_v1/
+  # Parameter Description
+  config: Path to format conversion config file
+  save_path: Output directory for processed data
+  register_path: Path to custom handler registration
   ```
+* Data Packing:
+  Configure pcl_reasoner_v1/config/data_process_packing.yaml to specify input paths for packed data generation:
   ```
   # dataset
   train_dataset:
     data_loader:
     ...
+    path: /path/to/handled_data # Processed dataset
     ...
   ```
+  *(Note: Example shows key fields only. Refer to repository for full config.)*
+  Run the packing script to generate sequence-packed data:
   ```
   export PYTHONPATH=/path/to/mindformers/:PYTHONPATH
+  python /path/to/mindformers/toolkit/data_preprocess/huggingface/datasets_preprocess.py
+      --config ./pcl_reasoner_v1_config/data_process_packing.yaml
+      --save_path /path/to/packed_data/
       --register_path ./pcl_reaoner_v1/
+  # Parameter Description
+  config: Path to data packing config file
+  save_path: Output directory for packed data
+  register_path: Path to custom handler registration
   ```
+### 3 Training Process
+#### 3.1 Weight Preparation
+Users can download pre-trained weights from HuggingFace:
+| Model Name          | Weights URL                                                                        |
 | ------------------- | --------------------------------------------------------------------------------- |
 | Qwen2.5-32B-Base | [https://huggingface.co/Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) |
+**Note**: MindFormers v1.5.0+ supports direct loading/saving of `safetensors` format weights. No conversion to `ckpt` is required. Subsequent fine-tuning will use `safetensors` format.
+#### 3.2 Training Configuration
+*(Only frequently modified configurations are shown. Full config: `pcl_reasoner_v1/config/finetune_pcl_reasoner_v1_32k.yaml`)*
+***Basic Configuration:***
+```yaml
+run_mode: 'finetune'               # Training mode: fine-tuning
+load_checkpoint: '/path/to/Qwen-32B-base/'  # Weight file path
+load_ckpt_format: 'safetensors'    # Weight format
+auto_trans_ckpt: True              # Enable online weight splitting for distributed training
 ```
+***Dataset Configuration:***
+```yaml
 train_dataset: &train_dataset
+  data_loader:
+    type: CommonDataLoader
+    path: "/path/to/dataset/pack_data_lt_32K_full"  # Packed dataset path
+    load_func: 'load_from_disk'     # Data loading method
+    shuffle: True                  # Enable data shuffling
+    packing: pack                  # Packed data format
+    adaptor_config:
+      compress_mask: True
+    mock_config:
+      seq_length: 32768            # Packed sequence length (32K tokens)
+      size: 25909                  # Dataset size / data parallelism split
 ```
+***Parallelism Configuration:***
+```yaml
 parallel_config:
+  data_parallel: &dp 8             # Data parallelism
+  model_parallel: 8                # Model parallelism
+  pipeline_stage: 2                # Pipeline parallelism stages
+  use_seq_parallel: True           # Enable sequence parallelism
+  optimizer_shard: True            # Enable optimizer sharding
+  micro_batch_num: 16              # Micro-batch size
 ```
+> *（Note: This configuration example only lists frequently modified items. Refer to the code repository for complete configurations.）*
+#### 3.3 Launching Fine-tuning
+Specify the configuration file `pcl_reasoner_v1/config/finetune_pcl_reasoner_v1_32k.yaml` in the launch script `run_pcl_reasoner_v1_finetune.sh`, and modify cluster parameters according to your hardware environment:
+```bash
 noderank=$1
 bash /path/to/mindformers/scripts/msrun_launcher.sh "run_mindformer.py \
 --cluster_time_out 1200 \
 > run.log 2>&1
+# Parameter Description
+config: Path to configuration file
+run_mode: Operation mode (pretrain/finetune/inference)
+worker_num: Total number of accelerator cards
+local_worker_num: Cards per single server
+master_addr: Master node address
+master_port: Master node port
+log_dir: Log directory path
+join: Whether to wait for all workers to exit
+cluster_time_out: Cluster timeout duration
 ```
+Then, launch the fine-tuning task using:
 ```
 bash run_pcl_reasoner_v1_finetune.sh 0
 ```
+> *（Note: When launching on multiple nodes, specify node_rank (e.g., 0 for the first node).）*
+After starting the task, monitor the runtime logs with:
 ```
+tail -f /path/to/log/worker_127.log
 ```
+### 4. Evaluation
+To ensure the fairness of evaluation results, we adopted the **open-source evaluation code from QwQ** ([QwQ/eval at main · QwenLM/QwQ](https://github.com/QwenLM/QwQ)). Developers can follow the `README.md` in the code repository to set up the environment and evaluate models.
+#### Evaluation Hyperparameters
+The sampling hyperparameters used are listed below:
+| Hyperparameter | Value |
+|----------------|---------------------------------|
+| `temperature`  | 0.6                             |
+| `top_k`        | 40                              |
+| `top_p`        | 0.95                            |
+| `max_tokens`   | 129,024                         |
+| `chat_template`| `./pcl_reasoner_v1/eval/am_thinking.jinja` |
+#### Evaluation Results on AIME24/25
+The table below compares mainstream models on the AIME24 and AIME25 benchmarks. For accuracy, we used the **Avg@32 metric** (averaging 32 sampling attempts per query):
 <table>
   <tr>
+    <th>Parameter Size</th>
+    <th>Model Name</th>
     <th>AIME 24</th>
     <th>AIME 25</th>
   </tr>
     <td><span style="color:red">90.8</span></td>
     <td><span style="color:grey">83</span></td>
   </tr>
   <!-- 合并行表头 32B -->
   <tr>
     <th rowspan="7">32B</th>
   </tr>
 </table>
+> *Note: Generated results for AIME24/25 are available in the [`pcl_reasoner_v1/eval/eval_res`](https://openi.pcl.ac.cn/PCL-Reasoner/V1) directory for developer verification and comparison.*
+#### Impact of Answer Length on Accuracy
+We analyzed the relationship between maximum answer length (`max_tokens`) and model accuracy. Due to results listed below, we find that on AIME24 which is relatively simpler, decode length of 64K are sufficient to achieve peak accuracy of 85.7%. In contrast, AIME25 which is relatively harder requires 128K tokens to reach optimal performance (84.2%):
 <table>
   <tr>
+    <th>Decode Length</th>
     <th>16K</th>
     <th>32K</th>
     <th>64K</th>
     <td>84.2</td>
   </tr>
 </table>

README_CN.md ADDED Viewed

	@@ -0,0 +1,403 @@

+# **PCL-Reasoner-V1模型**
+## 模型概览
+本次发布的PCL-Reasoner-V1模型，以Qwen2.5-32B-Base为起点，基于昇思框架与昇腾硬件进行了高性能的监督微调。经过微调，模型在数学推理能力上取得了显著提升：其在权威基准评测集AIME24上准确率达 85.7%，AIME25上达 84.2%，在 32B参数级别模型中稳居前列。
+为促进技术共享与应用，我们已完整开源了PCL-Reasoner-V1的模型权重、微调数据及训练代码。该模型不仅是当下领先的32B数学推理模型之一，更为开发者提供了宝贵的专业领域监督微调实践经验与后训练解决方案。用户可参照以下教程轻松部署体验，深入探索后训练的实践方法与奥秘！
+![eval_results](images/README/eval_results.png)
+## 开发指导
+### 1. 模型文件
+PCL-Reasoner-V1基于Qwen-2.5-Base进行微调后训练，训练流程基于MindFormers实现，主要涉及的文件有：
+数据处理：
+```
+pcl_reasoner_v1
+  ├── qwen2_5_tokenizer.py        # qwen2_5 tokenizer
+  ├── packing_handler.py          # 数据packing处理
+  └── data_preprocess
+  	├── decontaminate.py          # 数据污染检测
+  	└── dataset_prehandle_and_split.py # 数据拆分及预处理
+```
+模型配置：
+```
+pcl_reasoner_v1/config
+  ├── data_process_handling.yaml           # 数据格式转换配置文件
+  ├── data_process_packing.yaml            # 数据拼接配置文件
+  └── finetune_pcl_reasoner_v1_32k.yaml  # 模型微调配置文件
+```
+任务启动脚本：
+```
+pcl_reasoner_v1
+  └── run_pcl_reasoner_v1_finetune.sh  # 模型微调启动脚本
+```
+### 2.环境及数据准备
+#### 2.1 安装环境：
+| 软件| 版本 |
+| --- | --- |
+| 固件&驱动| 24.1.rc3.5 |
+| CANN| 7.7.T9.0.B057:8.1.RC1 |
+| Python | 3.10 |
+| MindSpore | 2.6.0  |
+| MindSpore TransFormers | r1.5.0 |
+#### 2.2 数据处理
+##### 2.2.1 数据集下载
+用户可以从HuggingFace官方下载原始数据集：
+| 数据集名称                    | 数据集链接                                                                                                                  |
+| ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
+| AM-DeepSeek-R1-0528-Distilled | [https://huggingface.co/a-m-team/AM-DeepSeek-R1-0528-Distilled](https://huggingface.co/a-m-team/AM-DeepSeek-R1-0528-Distilled) |
+##### 2.2.2 数据预处理
+首先，我们对源数据进行检测和筛选，操作分为两个步骤，验证集污染检测与数据筛选。
+* 验证集污染检测：我们采用基于all-MiniLM-L6-v2模型计算文本余弦相似度的方法，对数学部分原始数据针对AIME24/25评测集进行污染检测。该脚本执行后会在终端打印检测结果，并在指定的输出路径中保存相似度大于阈值的题目及其匹配的评测集题目。
+  ```
+  python PCL-Reasoner-V1/pcl_reasoner_v1/data_preprocess/decontaminate.py \
+  --target_data /path/to/target_data \
+  --contaminant_source PCL-Reasoner-V1/pcl_reasoner_v1/data_preprocess/aime2425_questions.json \
+  --model_path /path/to/distilled/model_path \
+  --output_file_prefix /path/to/output_file_prefix
+  --threshold 0.7
+  # 参数说明
+  target_data：需要被检测的数据
+  contaminant_source：污染源，即评测集数据
+  model_path：计算文本嵌入的模型
+  output_file_prefix：检测结果输出的路径
+  threshold：相似度阈值
+  ```
+* 数据筛选及处理：运行数据处理脚本，进行数据长度筛选，选取问题加思维链长度小于32K tokens的数据，并将提示词添加到数据中。
+  ```
+  python PCL-Reasoner-V1/pcl_reasoner_v1/data_preprocess/convert_and_split_dataset.py \
+  --json_file_paths /path/to/AM-DeepSeek-R1-0528-Distilled/math.jsonl
+  # 参数说明
+  json_file_paths：需要处理的数据集，支持传入多个路径，用空格分隔
+  ```
+其次，我们将数据转换成packing格式，操作分为两个步骤，格式转换与数据拼接。
+* 格式转换：在配置文件`pcl_reasoner_v1/config/data_process_handling.yaml`中指定`data_files`、`vocab_file`、`merges_file`等文件路径，指定`pcl_reasoner_ v1/packing _handler.py`文件中自定义的`AMDeepSeekDataHandler`为数据handler：
+  ```
+  train_dataset:
+      ...
+      path: "json" # 原始数据集文件格式
+      data_files:
+          ["/path/to/data.jsonl"] # 原始数据集路径
+      input_columns: *input_columns
+      handler:
+        - type: AMDeepSeekDataHandler # 指定自定义的数据处理类
+          ...
+          tokenizer:
+            auto_register: qwen2_5_tokenizer.Qwen2Tokenizer
+            ...
+            vocab_file: "/path/to/vocab.json" # Qwen2_5默认tokenizer文件
+            merges_file: "/path/to/merges.txt" # Qwen2_5默认tokenizer文件
+            ...
+  ```
+  *（注意事项：以上模型配置为示例，仅列出用户高频修改的配置项，完整配置文件见代码仓）*
+  运行数据处理脚本，生成Arrow格式数据文件：
+  ```
+  export PYTHONPATH=/path/to/mindformers/:PYTHONPATH
+  python th/to/mindformers/toolkit/data_preprocess/huggingface/datasets_preprocess.py
+      --config ./pcl_reasoner_v1/config/data_process_handling.yaml
+      --save_path /path/to/handled_data/
+      --register_path ./pcl_reasoner_v1/
+  # 参数说明
+  config：数据格式转换的配置文件路径
+  save_path：转换后数据集的保存文件夹路径
+  register_path：自定义数据Handler注册目录路径
+  ```
+* 数据拼接：
+  在配置文件pcl_reasoner_v1/config/data_process_packing.yaml指定packing后数据的存储路径：
+  ```
+  # dataset
+  train_dataset:
+    data_loader:
+    ...
+    path: /path/to/handled_data #预处理后数据集的路径
+    ...
+  ```
+  *（注意事项：以上模型配置为示例，仅列出用户高频修改的配置项，完整配置文件见代码仓）*
+  运行数据packing脚本，生成packing后数据文件：
+  ```
+  export PYTHONPATH=/path/to/mindformers/:PYTHONPATH
+  python /path/to/mindformers/toolkit/data_preprocess/huggingface/datasets_preprocess.py
+      --config ./pcl_reasoner_v1_config/data_process_packing.yaml
+      --save_path /path/to/packed_data/
+      --register_path ./pcl_reaoner_v1/
+  # 参数说明
+  config：数据拼接的配置文件路径
+  save_path：拼接后数据集的保存文件夹路径
+  register_path：自定义数据Handler注册目录路径
+  ```
+### 3 训练流程
+#### 3.1 权重准备
+用户可以从HuggingFace官方下载预训练权重
+| 模型名称          | 权重链接                                                                        |
+| ------------------- | --------------------------------------------------------------------------------- |
+| qwen2\_5-32b-Base | [https://huggingface.co/Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) |
+MindFormers 1.5.0及以上版本已支持safetensor格式的权重直接加载及保存，无需转换成ckpt，下文中微调将使用safetensors格式权重运行。
+#### 3.2 训练配置：
+下面仅列出用户高频修改的配置项，完整配置文件见`pcl_reasoner_v1/config/ finetune_pcl_reasoner_v1_32k.yaml`
+基本配置：
+```
+run_mode: 'finetune' # 设置训练模式为“finetune”
+load_checkpoint: '/path/to/Qwen-32B-base/' # 权重文件路径
+load_ckpt_format: 'safetensors' # 设置权重格式为“safetensors”
+auto_trans_ckpt: True  # 设置在线权重切分至分布式权重
+```
+数据集配置：
+```
+train_dataset: &train_dataset
+  data_loader:
+    type: CommonDataLoader
+    # offline
+    path: "/path/to/dataset/pack_data_lt_32K_full" # 数据文件路径
+    load_func: 'load_from_disk' # 设置数据加载方式为“load_from_disk”
+    shuffle: True # 数据打乱功能使能
+    packing: pack # 数据格式为pack
+    adaptor_config:
+      compress_mask: True
+    mock_config:
+      seq_length: 32768 # 数据pack后长度为32k
+      size: 25909 # 数据集大小/数据并行切分
+```
+并行配置：
+```
+parallel_config:
+  data_parallel: &dp 8 # 数据并行切分为8
+  model_parallel: 8 # 模型并行切分为8
+  pipeline_stage: 2 # 流水线并行切分为2
+  use_seq_parallel: True # 序列并行使能
+  optimizer_shard: True  # 优化器并行使能
+  micro_batch_num: 16 # micro bathsize设置为16
+```
+> *（注意事项：以上模型配置为示例，仅列出用户高频修改的配置项，完整配置文件见代码仓）*
+#### 3.3 启动微调
+在启动脚本`run_pcl_reasoner_v1_finetune.sh`指定配置文件`pcl_reasoner _v1/config/finetune_pcl_reasoner_v1_32k.yaml`，并根据用户的实际情况对卡数、服务器IP等配置进行修改：
+```
+noderank=$1
+bash /path/to/mindformers/scripts/msrun_launcher.sh "run_mindformer.py \
+--config /path/to/finetune_pcl_reasoner_v1_32k.yaml \
+--run_mode finetune" \
+--worker_num 128 \
+--local_worker_num 8 \
+--master_addr XX.XX.XX.XX \
+--master_port XXXX \
+--node_rank $noderank \
+--log_dir /path/to/log \
+--join False \
+--cluster_time_out 1200 \
+> run.log 2>&1
+# 参数说明
+config：配置文件路径
+run_mode：运行模式（预训练/微调/推理）
+worker_num： 总卡数
+local_worker_num： 单机的卡数
+master_addr：主节点地址
+master_port: 主节点端口
+log_dir: 日志路径
+join：是否等待所有worker退出
+cluster_time_out：集群等待时间
+```
+然后，使用`bash run_pcl_reasoner_v1_finetune.sh`指令启动微调训练，在多个节点上启动时，需指定`node_rank`（以下指令以0节点为示例）：
+```
+bash run_pcl_reasoner_v1_finetune.sh 0
+```
+在拉起任务后，通过以下指令查看运行日志：
+```
+tail -f path/to/log/worker_127.log
+```
+### 4. 评测流程：
+为了保障评测结果的公平性，我们采用了QwQ开源的评测代码（QwQ/eval at main · QwenLM/QwQ），可以根据代码仓中README.md指导进行环境安装及模型评测。
+我们采用的评测超参如下所示：
+| 采样超参       | 取值                                        |
+| ---------------- | --------------------------------------------- |
+| temperature    | 0.6                                         |
+| top\_k         | 40                                          |
+| top\_p         | 0.95                                        |
+| max\_tokens    | 129024                                      |
+| chat\_template |`./pcl_reasoner_v1/eval/am_thinking.jinja` |
+我们在AIME24/AIME25评测结果详见下表数据。为确保评估准确性，我们采用Avg@32指标（平均32次采样）进行了评测：
+<!-- 表格基础样式（可选添加） -->
+<style>
+  table { border-collapse: collapse; width: 100%; margin-left: auto;margin-right: auto;}
+  th, td { border: 1px solid #ddd; padding: 8px; text-align: center; }
+</style>
+<!-- 表格主体 -->
+<table>
+  <tr>
+    <th>模型规格</th>
+    <th>模型</th>
+    <th>AIME 24</th>
+    <th>AIME 25</th>
+  </tr>
+  <!-- 合并行表头 >100B -->
+  <tr>
+    <th rowspan="6">&gt;100B</th>
+  </tr>
+  <!-- >100B 组数据行 -->
+  <tr>
+    <td>DeepSeek-R1</td>
+    <td><span style="color:grey">79.8</span></td>
+    <td><span style="color:grey">70</span></td>
+  </tr>
+  <tr>
+    <td>DeepSeek-R1-0528</td>
+    <td><span style="color:red">91.4</span></td>
+    <td><span style="color:red">87.5</span></td>
+  </tr>
+  <tr>
+    <td>Qwen3-235B-A22B</td>
+    <td><span style="color:grey">85.7</span></td>
+    <td><span style="color:grey">81.5</span></td>
+  </tr>
+  <tr>
+    <td>OpenAI-o3</td>
+    <td><span style="color:red">91.6</span></td>
+    <td><span style="color:red">88.9</span></td>
+  </tr>
+  <tr>
+    <td>Gemini-2.5-Pro-0506</td>
+    <td><span style="color:red">90.8</span></td>
+    <td><span style="color:grey">83</span></td>
+  </tr>
+  <!-- 分隔行 -->
+  <tr>
+    <td colspan="4"></td>
+  </tr>
+  <!-- 合并行表头 32B -->
+  <tr>
+    <th rowspan="7">32B</th>
+  </tr>
+  <!-- 32B 组数据行 -->
+  <tr>
+    <td>Qwen3-32B</td>
+    <td><span style="color:grey">81.4</span></td>
+    <td><span style="color:grey">72.9</span></td>
+  </tr>
+  <tr>
+    <td>QwQ-32B</td>
+    <td><span style="color:grey">79.5</span></td>
+    <td><span style="color:grey">69.5</span></td>
+  </tr>
+  <tr>
+    <td>DeepSeek-R1-Distill-Qwen-32B</td>
+    <td><span style="color:grey">72.6</span></td>
+    <td><span style="color:grey">49.6</span></td>
+  </tr>
+  <tr>
+    <td>Skywork-OR1-32B</td>
+    <td><span style="color:grey">82.2</span></td>
+    <td><span style="color:grey">73.3</span></td>
+  </tr>
+  <tr>
+    <td>AM-Thinking-v1</td>
+    <td><span style="color:grey">85.3</span></td>
+    <td><span style="color:grey">74.4</span></td>
+  </tr>
+  <tr>
+    <td>PCL-Reasoner-v1</td>
+    <td><p style="font-weight: bold;">85.7</p></td>
+    <td><p style="font-weight: bold;">84.2</p></td>
+  </tr>
+</table>
+> *(注：模型在AIME24/25评测集上的生成结果文件已同步上传至 `pcl_reasoner_v1/eval/eval_res`目录，供开发者用于模型验证与效果比对参考）*
+另外，我们也针对评测时不同模型回答长度统计正确率，可以看出AIME24/25评测集对回答长度要求较高，而且较为简单的AIME24上，64K tokens的回答长度可以满足，而较为难的AIME25上则需要回答长度长达128K tokens：
+<style>
+  table { border-collapse: collapse; width: 100%; margin-left: auto;margin-right: auto;}
+  th, td { border: 1px solid #ddd; padding: 8px; text-align: center; }
+</style>
+<table>
+  <tr>
+    <th>回答长度</th>
+    <th>16k</th>
+    <th>32k</th>
+    <th>64k</th>
+    <th>128k</th>
+  </tr>
+  <tr>
+    <td>AIME24</td>
+    <td>42.0</td>
+    <td>77.9</td>
+    <td>85.7</td>
+    <td>85.7</td>
+  </tr>
+  <tr>
+    <td>AIME25</td>
+    <td>33.4</td>
+    <td>75.6</td>
+    <td>83.9</td>
+    <td>84.2</td>
+  </tr>
+</table>