This model is aligned using the AlpacaFarm dataset, fine-tuned through the Direct Preference Optimization (DPO) loss. The alignment process started from the Supervised Fine-Tuned (SFT) version of LLaMA 2 7B. The optimization process was conducted with a single epoch and a beta parameter set to 0.01. For more information on the dataset and methodology, refer to the AlpacaFarm documentation (https://github.com/tatsu-lab/alpaca_farm) and DPO paper (https://arxiv.org/abs/2305.18290).

Downloads last month: 1

Inference Providers NEW

This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for sabersaleh/Llama2-7B-DPO

Base model

meta-llama/Llama-2-7b

Finetuned

(32)

this model

sabersaleh
/

Llama2-7B-DPO

Model tree for sabersaleh/Llama2-7B-DPO

Dataset used to train sabersaleh/Llama2-7B-DPO