Safetensors
gemma
ZHLiu627's picture
Update README.md
d66c14a verified
metadata
license: apache-2.0
datasets:
  - argilla/dpo-mix-7k

This models uses OpenRLHF Codebase for the average loss with the method Regularized-Preference-Optimization . The SFT loss coefficient is 0.2. The relevant paper is (https://arxiv.org/abs/2405.16436).