Triangle104 commited on
Commit
673bd0e
·
verified ·
1 Parent(s): 53c0a24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -10,6 +10,30 @@ base_model: internlm/OREAL-7B
10
  This model was converted to GGUF format from [`internlm/OREAL-7B`](https://huggingface.co/internlm/OREAL-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
11
  Refer to the [original model card](https://huggingface.co/internlm/OREAL-7B) for more details on the model.
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ## Use with llama.cpp
14
  Install llama.cpp through brew (works on Mac and Linux)
15
 
 
10
  This model was converted to GGUF format from [`internlm/OREAL-7B`](https://huggingface.co/internlm/OREAL-7B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
11
  Refer to the [original model card](https://huggingface.co/internlm/OREAL-7B) for more details on the model.
12
 
13
+ ---
14
+ Introduction
15
+
16
+
17
+
18
+
19
+ We introduce OREAL-7B and OREAL-32B, a mathematical reasoning model series trained using Outcome REwArd-based reinforcement Learning, a novel RL framework designed for tasks where only binary outcome rewards are available.
20
+
21
+
22
+ With OREAL, a 7B model achieves 94.0 pass@1 accuracy on MATH-500, matching the performance of previous 32B models. OREAL-32B further surpasses previous distillation-trained 32B models, reaching 95.0 pass@1 accuracy on MATH-500.
23
+
24
+
25
+
26
+
27
+
28
+ Our method leverages best-of-N (BoN) sampling for behavior cloning
29
+ and reshapes negative sample rewards to ensure gradient consistency.
30
+ Also, to address the challenge of sparse rewards in long
31
+ chain-of-thought reasoning, we incorporate an on-policy token-level
32
+ reward model that identifies key tokens in reasoning trajectories for
33
+ importance sampling. For more details, please refer to our paper.
34
+
35
+ ---
36
+
37
  ## Use with llama.cpp
38
  Install llama.cpp through brew (works on Mac and Linux)
39