zephyr-7b-gemma-rpo-avg / README.md

Update README.md

d66c14a verified 15 days ago

369 Bytes

metadata

license: apache-2.0
datasets:
  - argilla/dpo-mix-7k

This models uses OpenRLHF Codebase for the average loss with the method Regularized-Preference-Optimization . The SFT loss coefficient is 0.2. The relevant paper is (https://arxiv.org/abs/2405.16436).