Update README.md
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ Please see our [blog post](https://novasky-ai.github.io/posts/reduce-overthinkin
|
|
26 |
|
27 |
### Training Data
|
28 |
|
29 |
-
|
30 |
|
31 |
### Training Procedure
|
32 |
We perform Simple Policy Optimization (SimPO) with a batch size of 96, learning rate of 5e-7, gamma of 0.3, and beta of 2.0.
|
|
|
26 |
|
27 |
### Training Data
|
28 |
|
29 |
+
10K preference pairs in math and coding domains, generated by Sky-T1-32B-Preview.
|
30 |
|
31 |
### Training Procedure
|
32 |
We perform Simple Policy Optimization (SimPO) with a batch size of 96, learning rate of 5e-7, gamma of 0.3, and beta of 2.0.
|