PRIME-RL
/

EurusPRM-Stage2

Model card Files Files and versions Community

yuchenFan commited on Jan 16

Commit

fe98fad

·

verified ·

1 Parent(s): 7f2b61d

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -130,7 +130,9 @@ We use codes in [Implicit PRM](https://github.com/PRIME-RL/ImplicitPRM/tree/main
 ### Evaluation Base Model
-We adopt **Eurus-2-7B-SFT**, **Qwen2.5-7B-Instruct** and **Llama-3.1-70B-Instruct** as generation models to evaluate the performance of our implicit PRM. For all models, we set the sampling temperature as 0.5, *p* of the top-*p* sampling as 1.
 ### Best-of-N Sampling

 ### Evaluation Base Model
+For **Best-of N Sampling**, we adopt **Eurus-2-7B-SFT**, **Qwen2.5-7B-Instruct** and **Llama-3.1-70B-Instruct** as generation models to evaluate the performance of our implicit PRM. For all models, we set the sampling temperature as 0.5, *p* of the top-*p* sampling as 1.
+For **ProcessBench**, we adopt **Math-Shepherd-PRM-7B**, **RLHFlow-PRM-Mistral-8B**, **RLHFlow-PRM-Deepseek-8B**, **Skywork-PRM-7B**, **EurusPRM-Stage 1**, and **EurusPRM-Stage 2**.
 ### Best-of-N Sampling