ZHLiu627
/

zephyr-7b-gemma-rpo-avg

Model card Files Files and versions Community

ZHLiu627 commited on 15 days ago

Commit

d66c14a

·

verified ·

1 Parent(s): 6860de7

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -2,4 +2,6 @@
 license: apache-2.0
 datasets:
 - argilla/dpo-mix-7k
----

 license: apache-2.0
 datasets:
 - argilla/dpo-mix-7k
+---
+This models uses [OpenRLHF Codebase](https://github.com/OpenRLHF/OpenRLHF) for the average loss with the method [Regularized-Preference-Optimization
+](https://github.com/YSLIU627/Regularized-Preference-Optimization). The SFT loss coefficient is `0.2`. The relevant paper is (https://arxiv.org/abs/2405.16436).