ZHLiu627
/

zephyr-7b-gemma-rpo-avg

Model card Files Files and versions Community

zephyr-7b-gemma-rpo-avg / README.md

ZHLiu627's picture

Update README.md

d66c14a verified 16 days ago

|

history blame contribute delete

369 Bytes

	---
	license: apache-2.0
	datasets:
	- argilla/dpo-mix-7k
	---
	This models uses [OpenRLHF Codebase](https://github.com/OpenRLHF/OpenRLHF) for the average loss with the method [Regularized-Preference-Optimization
	](https://github.com/YSLIU627/Regularized-Preference-Optimization). The SFT loss coefficient is `0.2`. The relevant paper is (https://arxiv.org/abs/2405.16436).