Safetensors
gemma
File size: 369 Bytes
6860de7
 
 
 
d66c14a
 
 
1
2
3
4
5
6
7
8
---
license: apache-2.0
datasets:
- argilla/dpo-mix-7k
---
This models uses [OpenRLHF Codebase](https://github.com/OpenRLHF/OpenRLHF) for the average loss with the method [Regularized-Preference-Optimization
](https://github.com/YSLIU627/Regularized-Preference-Optimization). The SFT loss coefficient is `0.2`. The relevant paper is (https://arxiv.org/abs/2405.16436).