YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Quantization made by Richard Erkhov.

Github

Discord

Request more models

Llama-3-Instruct-8B-SPPO-Iter1 - bnb 4bits

Original model description:

license: apache-2.0 datasets: - openbmb/UltraFeedback language: - en pipeline_tag: text-generation

Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)

Llama-3-Instruct-8B-SPPO-Iter1

This model was developed using Self-Play Preference Optimization at iteration 1, based on the meta-llama/Meta-Llama-3-8B-Instruct architecture as starting point. We utilized the prompt sets from the openbmb/UltraFeedback dataset, splited to 3 parts for 3 iterations by snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset. All responses used are synthetic.

Links to Other Models

Model Description

  • Model type: A 8B parameter GPT-like model fine-tuned on synthetic datasets.
  • Language(s) (NLP): Primarily English
  • License: Apache-2.0
  • Finetuned from model: meta-llama/Meta-Llama-3-8B-Instruct

AlpacaEval Leaderboard Evaluation Results

Model LC. Win Rate Win Rate Avg. Length
Llama-3-8B-SPPO Iter1 31.73 31.74 1962
Llama-3-8B-SPPO Iter2 35.15 35.98 2021
Llama-3-8B-SPPO Iter3 38.77 39.85 2066

Open LLM Leaderboard Evaluation Results

Results are reported by using lm-evaluation-harness v0.4.1

arc_challenge truthfulqa_mc2 winogrande gsm8k hellaswag mmlu average
Llama-3-8B-SPPO Iter1 63.82 54.96 76.40 75.44 79.80 65.65 69.35
Llama-3-8B-SPPO Iter2 64.93 56.48 76.87 75.13 80.39 65.67 69.91
Llama-3-8B-SPPO Iter3 65.19 58.04 77.11 74.91 80.86 65.60 70.29

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • eta: 1000
  • per_device_train_batch_size: 8
  • gradient_accumulation_steps: 1
  • seed: 42
  • distributed_type: deepspeed_zero3
  • num_devices: 8
  • optimizer: RMSProp
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_train_epochs: 6.0 (stop at epoch=1.0)

Citation

@misc{wu2024self,
      title={Self-Play Preference Optimization for Language Model Alignment}, 
      author={Wu, Yue and Sun, Zhiqing and Yuan, Huizhuo and Ji, Kaixuan and Yang, Yiming and Gu, Quanquan},
      year={2024},
      eprint={2405.00675},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Downloads last month
5
Safetensors
Model size
4.65B params
Tensor type
FP16
F32
U8
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.