Update README.md
Browse files
README.md
CHANGED
@@ -2,4 +2,6 @@
|
|
2 |
license: apache-2.0
|
3 |
datasets:
|
4 |
- argilla/dpo-mix-7k
|
5 |
-
---
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
datasets:
|
4 |
- argilla/dpo-mix-7k
|
5 |
+
---
|
6 |
+
This models uses [OpenRLHF Codebase](https://github.com/OpenRLHF/OpenRLHF) for the average loss with the method [Regularized-Preference-Optimization
|
7 |
+
](https://github.com/YSLIU627/Regularized-Preference-Optimization). The SFT loss coefficient is `0.2`. The relevant paper is (https://arxiv.org/abs/2405.16436).
|