declare-lab
/

PathFinder-PRM-7B

Text Classification

text-generation

text-generation-inference

Model card Files Files and versions

soujanyaporia commited on May 27

Commit

796e6eb

·

verified ·

1 Parent(s): eddd220

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -23,7 +23,7 @@ tags:
 ## Introduction
-PathFinder-PRM-7B is a hierarchical discriminative Process Reward Model(PRM) designed to identify errors and reward correct math reasoning in multi-step outputs from large language models (LLMs). Instead of treating evaluation as a single correct-or-wrong decision, PathFinder-PRM-7B breaks down its error judgment into 2 parts: whether the reasoning is mathematically correct, and logically consistent. It predicts these aspects separately and then combines them to decide if the current reasoning steps leads to a correct final solution. PathFinder-PRM-7B is trained on a combination of high-quality human annotated data (PRM800K) and additional automatically annotated samples, enabling robustness to common failure patterns and strong generalization across diverse benchmarks such as ProcessBench and PRMBench.
 ## Model Details
@@ -243,9 +243,9 @@ Note: All results are computed using reward-guided greedy search with Qwen2.5‑
       title={Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision},
       author={Tej Deep Pala and Panshul Sharma and Amir Zadeh and Chuan Li and Soujanya Poria},
       year={2025},
-      eprint={2505.12345},
       archivePrefix={arXiv},
-      primaryClass={cs.LG},
-      url={https://arxiv.org/abs/2505.12345},
 }
 ```

 ## Introduction
+PathFinder-PRM-7B is a hierarchical discriminative Process Reward Model (PRM) designed to identify errors and reward correct math reasoning in multi-step outputs from large language models (LLMs). Instead of treating evaluation as a single correct-or-wrong decision, PathFinder-PRM-7B breaks down its error judgment into 2 parts: whether the reasoning is mathematically correct, and logically consistent. It predicts these aspects separately and then combines them to decide if the current reasoning steps leads to a correct final solution. PathFinder-PRM-7B is trained on a combination of high-quality human annotated data (PRM800K) and additional automatically annotated samples, enabling robustness to common failure patterns and strong generalization across diverse benchmarks such as ProcessBench and PRMBench.
 ## Model Details
       title={Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision},
       author={Tej Deep Pala and Panshul Sharma and Amir Zadeh and Chuan Li and Soujanya Poria},
       year={2025},
+      eprint={2505.19706},
       archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2505.19706},
 }
 ```