Update README.md
Browse files
README.md
CHANGED
@@ -130,7 +130,9 @@ We use codes in [Implicit PRM](https://github.com/PRIME-RL/ImplicitPRM/tree/main
|
|
130 |
|
131 |
### Evaluation Base Model
|
132 |
|
133 |
-
|
|
|
|
|
134 |
|
135 |
### Best-of-N Sampling
|
136 |
|
|
|
130 |
|
131 |
### Evaluation Base Model
|
132 |
|
133 |
+
For **Best-of N Sampling**, we adopt **Eurus-2-7B-SFT**, **Qwen2.5-7B-Instruct** and **Llama-3.1-70B-Instruct** as generation models to evaluate the performance of our implicit PRM. For all models, we set the sampling temperature as 0.5, *p* of the top-*p* sampling as 1.
|
134 |
+
|
135 |
+
For **ProcessBench**, we adopt **Math-Shepherd-PRM-7B**, **RLHFlow-PRM-Mistral-8B**, **RLHFlow-PRM-Deepseek-8B**, **Skywork-PRM-7B**, **EurusPRM-Stage 1**, and **EurusPRM-Stage 2**.
|
136 |
|
137 |
### Best-of-N Sampling
|
138 |
|