Update README.md
Browse files
README.md
CHANGED
@@ -10,6 +10,30 @@ base_model: internlm/OREAL-7B
|
|
10 |
This model was converted to GGUF format from [`internlm/OREAL-7B`](https://huggingface.co/internlm/OREAL-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
11 |
Refer to the [original model card](https://huggingface.co/internlm/OREAL-7B) for more details on the model.
|
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
## Use with llama.cpp
|
14 |
Install llama.cpp through brew (works on Mac and Linux)
|
15 |
|
|
|
10 |
This model was converted to GGUF format from [`internlm/OREAL-7B`](https://huggingface.co/internlm/OREAL-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
11 |
Refer to the [original model card](https://huggingface.co/internlm/OREAL-7B) for more details on the model.
|
12 |
|
13 |
+
---
|
14 |
+
Introduction
|
15 |
+
|
16 |
+
|
17 |
+
|
18 |
+
|
19 |
+
We introduce OREAL-7B and OREAL-32B, a mathematical reasoning model series trained using Outcome REwArd-based reinforcement Learning, a novel RL framework designed for tasks where only binary outcome rewards are available.
|
20 |
+
|
21 |
+
|
22 |
+
With OREAL, a 7B model achieves 94.0 pass@1 accuracy on MATH-500, matching the performance of previous 32B models. OREAL-32B further surpasses previous distillation-trained 32B models, reaching 95.0 pass@1 accuracy on MATH-500.
|
23 |
+
|
24 |
+
|
25 |
+
|
26 |
+
|
27 |
+
|
28 |
+
Our method leverages best-of-N (BoN) sampling for behavior cloning
|
29 |
+
and reshapes negative sample rewards to ensure gradient consistency.
|
30 |
+
Also, to address the challenge of sparse rewards in long
|
31 |
+
chain-of-thought reasoning, we incorporate an on-policy token-level
|
32 |
+
reward model that identifies key tokens in reasoning trajectories for
|
33 |
+
importance sampling. For more details, please refer to our paper.
|
34 |
+
|
35 |
+
---
|
36 |
+
|
37 |
## Use with llama.cpp
|
38 |
Install llama.cpp through brew (works on Mac and Linux)
|
39 |
|