Magpie-Align
/

Llama-3-8B-Magpie-Align-v0.1

@@ -16,36 +16,139 @@ model-index:
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# Llama-3-8B-Magpie-Pro-MT-UltraDPO2
-This model is a fine-tuned version of [Magpie-Align/Llama-3-8B-Magpie-Pro-MT-SFT-v0.1](https://huggingface.co/Magpie-Align/Llama-3-8B-Magpie-Pro-MT-SFT-v0.1) on the princeton-nlp/llama3-ultrafeedback dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.6084
-- Rewards/chosen: -1.6265
-- Rewards/rejected: -1.9735
-- Rewards/accuracies: 0.6809
-- Rewards/margins: 0.3470
-- Logps/rejected: -458.6070
-- Logps/chosen: -418.2021
-- Logits/rejected: -0.6447
-- Logits/chosen: -0.6439
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -73,6 +176,16 @@ The following hyperparameters were used during training:
 | 0.6376        | 0.6413 | 300  | 0.6178          | -1.3533        | -1.6413          | 0.6748             | 0.2880          | -425.3859      | -390.8818    | -0.6753         | -0.6758       |
 | 0.5888        | 0.8550 | 400  | 0.6088          | -1.6321        | -1.9785          | 0.6829             | 0.3464          | -459.1051      | -418.7560    | -0.6440         | -0.6435       |
 ### Framework versions
@@ -80,3 +193,81 @@ The following hyperparameters were used during training:
 - Pytorch 2.3.1+cu121
 - Datasets 2.20.0
 - Tokenizers 0.19.1

   results: []
 ---
+# 🐦 Llama-3-8B-Magpie-OpenAlign
+Project Web: [https://magpie-align.github.io/](https://magpie-align.github.io/)
+Arxiv Technical Report: [https://arxiv.org/abs/2406.08464](https://arxiv.org/abs/2406.08464)
+Codes: [https://github.com/magpie-align/magpie](https://github.com/magpie-align/magpie)
+## About This Model
+This model is an aligned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B).
+- We first use [Magpie-Align/Magpie-Pro-MT-300K-v0.1](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-MT-300K-v0.1) dataset and perform SFT -> [Magpie-Align/Llama-3-8B-Magpie-Pro-MT-SFT-v0.1](https://huggingface.co/Magpie-Align/Llama-3-8B-Magpie-Pro-MT-SFT-v0.1)
+- We then perform DPO on the [princeton-nlp/llama3-ultrafeedback](https://huggingface.co/datasets/princeton-nlp/llama3-ultrafeedback) dataset.
+The performance is better than the official Llama-3-8B-Instruct Model!
+- **Alpaca Eval 2 (vs GPT-4-Turbo-1106): 38.52 (LC), 38.47 (WR)**
+- **Alpaca Eval 2 (vs Llama-3-8B-Instruct): 69.37 (LC), 70.05 (WR)**
+- **Arena Hard: 32.4**
+## Other Information
+**License**: Please follow [Meta Llama 3 Community License](https://llama.meta.com/llama3/license).
+**Conversation Template**: Please use Llama 3 **official chat template** for the best performance.
+## Stage 1: Supervised Fine-tuning
+We use [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for SFT.
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 32
+- total_eval_batch_size: 4
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 100
+- num_epochs: 2
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 0.8807        | 0.0007 | 1    | 0.9001          |
+| 0.5113        | 0.3337 | 464  | 0.5178          |
+| 0.4668        | 0.6673 | 928  | 0.4792          |
+| 0.4492        | 1.0010 | 1392 | 0.4582          |
+| 0.3498        | 1.3205 | 1856 | 0.4575          |
+| 0.3525        | 1.6542 | 2320 | 0.4555          |
+### Framework versions
+- Transformers 4.40.2
+- Pytorch 2.3.0+cu121
+- Datasets 2.19.1
+- Tokenizers 0.19.1
+[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.4.0`
+```yaml
+base_model: meta-llama/Meta-Llama-3-8B
+model_type: LlamaForCausalLM
+tokenizer_type: AutoTokenizer
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+datasets:
+  - path: Magpie-Align/Magpie-Pro-MT-300K-v0.1
+    type: sharegpt
+    conversation: llama3
+dataset_prepared_path: last_run_prepared
+val_set_size: 0.001
+output_dir: ./out_Llama-3-8B-Magpie-Pro-300K-MT
+sequence_len: 8192
+sample_packing: true
+eval_sample_packing: false
+pad_to_sequence_len: true
+gradient_accumulation_steps: 8
+micro_batch_size: 1
+num_epochs: 2
+optimizer: paged_adamw_8bit
+lr_scheduler: cosine
+learning_rate: 2e-5
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+early_stopping_patience:
+resume_from_checkpoint:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+warmup_steps: 100
+evals_per_epoch: 3
+eval_table_size:
+saves_per_epoch: 3
+debug:
+deepspeed:
+weight_decay: 0.0
+fsdp:
+fsdp_config:
+special_tokens:
+  pad_token: <|end_of_text|>
+```
+</details><be>
+## Stage 2: Direct Preference Optimization
+We use [alignment handbook](https://github.com/huggingface/alignment-handbook) for DPO.
 ### Training hyperparameters
 | 0.6376        | 0.6413 | 300  | 0.6178          | -1.3533        | -1.6413          | 0.6748             | 0.2880          | -425.3859      | -390.8818    | -0.6753         | -0.6758       |
 | 0.5888        | 0.8550 | 400  | 0.6088          | -1.6321        | -1.9785          | 0.6829             | 0.3464          | -459.1051      | -418.7560    | -0.6440         | -0.6435       |
+It achieves the following results on the evaluation set:
+- Loss: 0.6084
+- Rewards/chosen: -1.6265
+- Rewards/rejected: -1.9735
+- Rewards/accuracies: 0.6809
+- Rewards/margins: 0.3470
+- Logps/rejected: -458.6070
+- Logps/chosen: -418.2021
+- Logits/rejected: -0.6447
+- Logits/chosen: -0.6439
 ### Framework versions
 - Pytorch 2.3.1+cu121
 - Datasets 2.20.0
 - Tokenizers 0.19.1
+## Citation
+If you find the model, data, or code useful, please cite our paper:
+```
+@misc{xu2024magpie,
+	title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
+	author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
+	year={2024},
+	eprint={2406.08464},
+	archivePrefix={arXiv},
+	primaryClass={cs.CL}
+}
+```
+<details><summary>See alignment handbook config</summary>
+```yaml
+# Model arguments
+model_name_or_path: Magpie-Align/Llama-3-8B-Magpie-Pro-MT-SFT-v0.1
+torch_dtype: null
+# Data training arguments
+# For definitions, see: src/h4/training/config.py
+dataset_mixer:
+  princeton-nlp/llama3-ultrafeedback: 1.0
+dataset_splits:
+- train
+- test
+preprocessing_num_workers: 12
+# DPOTrainer arguments
+bf16: true
+beta: 0.01
+do_eval: true
+evaluation_strategy: steps
+eval_steps: 100
+gradient_accumulation_steps: 16
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: False
+hub_model_id: Magpie-Align/Llama-3-8B-Magpie-Pro-MT-UltraDPO2
+learning_rate: 1.0e-6
+log_level: info
+logging_steps: 1
+lr_scheduler_type: cosine
+max_length: 2048
+max_prompt_length: 1800
+num_train_epochs: 1
+optim: adamw_torch
+output_dir: data/magpie-pro-mt-ultradpo-1e-6
+per_device_train_batch_size: 2
+per_device_eval_batch_size: 4
+push_to_hub: true
+save_strategy: "steps"
+save_steps: 100
+save_total_limit: 1
+seed: 42
+warmup_ratio: 0.1
+```
+</details><be>
+## Citation
+If you find the model, data, or code useful, please cite our paper:
+```
+@misc{xu2024magpie,
+	title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
+	author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
+	year={2024},
+	eprint={2406.08464},
+	archivePrefix={arXiv},
+	primaryClass={cs.CL}
+}
+```