flydust commited on
Commit
15fa6c7
·
verified ·
1 Parent(s): 26b5458

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +212 -21
README.md CHANGED
@@ -16,36 +16,139 @@ model-index:
16
  results: []
17
  ---
18
 
19
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
20
- should probably proofread and complete it, then remove this comment. -->
21
 
22
- # Llama-3-8B-Magpie-Pro-MT-UltraDPO2
23
 
24
- This model is a fine-tuned version of [Magpie-Align/Llama-3-8B-Magpie-Pro-MT-SFT-v0.1](https://huggingface.co/Magpie-Align/Llama-3-8B-Magpie-Pro-MT-SFT-v0.1) on the princeton-nlp/llama3-ultrafeedback dataset.
25
- It achieves the following results on the evaluation set:
26
- - Loss: 0.6084
27
- - Rewards/chosen: -1.6265
28
- - Rewards/rejected: -1.9735
29
- - Rewards/accuracies: 0.6809
30
- - Rewards/margins: 0.3470
31
- - Logps/rejected: -458.6070
32
- - Logps/chosen: -418.2021
33
- - Logits/rejected: -0.6447
34
- - Logits/chosen: -0.6439
35
 
36
- ## Model description
37
 
38
- More information needed
 
 
39
 
40
- ## Intended uses & limitations
41
 
42
- More information needed
43
 
44
- ## Training and evaluation data
45
 
46
- More information needed
47
 
48
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
  ### Training hyperparameters
51
 
@@ -73,6 +176,16 @@ The following hyperparameters were used during training:
73
  | 0.6376 | 0.6413 | 300 | 0.6178 | -1.3533 | -1.6413 | 0.6748 | 0.2880 | -425.3859 | -390.8818 | -0.6753 | -0.6758 |
74
  | 0.5888 | 0.8550 | 400 | 0.6088 | -1.6321 | -1.9785 | 0.6829 | 0.3464 | -459.1051 | -418.7560 | -0.6440 | -0.6435 |
75
 
 
 
 
 
 
 
 
 
 
 
76
 
77
  ### Framework versions
78
 
@@ -80,3 +193,81 @@ The following hyperparameters were used during training:
80
  - Pytorch 2.3.1+cu121
81
  - Datasets 2.20.0
82
  - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  results: []
17
  ---
18
 
19
+ # 🐦 Llama-3-8B-Magpie-OpenAlign
 
20
 
21
+ Project Web: [https://magpie-align.github.io/](https://magpie-align.github.io/)
22
 
23
+ Arxiv Technical Report: [https://arxiv.org/abs/2406.08464](https://arxiv.org/abs/2406.08464)
24
+
25
+ Codes: [https://github.com/magpie-align/magpie](https://github.com/magpie-align/magpie)
26
+
27
+ ## About This Model
28
+
29
+ This model is an aligned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B).
30
+ - We first use [Magpie-Align/Magpie-Pro-MT-300K-v0.1](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-MT-300K-v0.1) dataset and perform SFT -> [Magpie-Align/Llama-3-8B-Magpie-Pro-MT-SFT-v0.1](https://huggingface.co/Magpie-Align/Llama-3-8B-Magpie-Pro-MT-SFT-v0.1)
31
+ - We then perform DPO on the [princeton-nlp/llama3-ultrafeedback](https://huggingface.co/datasets/princeton-nlp/llama3-ultrafeedback) dataset.
 
 
32
 
33
+ The performance is better than the official Llama-3-8B-Instruct Model!
34
 
35
+ - **Alpaca Eval 2 (vs GPT-4-Turbo-1106): 38.52 (LC), 38.47 (WR)**
36
+ - **Alpaca Eval 2 (vs Llama-3-8B-Instruct): 69.37 (LC), 70.05 (WR)**
37
+ - **Arena Hard: 32.4**
38
 
39
+ ## Other Information
40
 
41
+ **License**: Please follow [Meta Llama 3 Community License](https://llama.meta.com/llama3/license).
42
 
43
+ **Conversation Template**: Please use Llama 3 **official chat template** for the best performance.
44
 
45
+ ## Stage 1: Supervised Fine-tuning
46
 
47
+ We use [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for SFT.
48
+
49
+ ### Training hyperparameters
50
+
51
+ The following hyperparameters were used during training:
52
+ - learning_rate: 2e-05
53
+ - train_batch_size: 1
54
+ - eval_batch_size: 1
55
+ - seed: 42
56
+ - distributed_type: multi-GPU
57
+ - num_devices: 4
58
+ - gradient_accumulation_steps: 8
59
+ - total_train_batch_size: 32
60
+ - total_eval_batch_size: 4
61
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
62
+ - lr_scheduler_type: cosine
63
+ - lr_scheduler_warmup_steps: 100
64
+ - num_epochs: 2
65
+
66
+ ### Training results
67
+
68
+ | Training Loss | Epoch | Step | Validation Loss |
69
+ |:-------------:|:------:|:----:|:---------------:|
70
+ | 0.8807 | 0.0007 | 1 | 0.9001 |
71
+ | 0.5113 | 0.3337 | 464 | 0.5178 |
72
+ | 0.4668 | 0.6673 | 928 | 0.4792 |
73
+ | 0.4492 | 1.0010 | 1392 | 0.4582 |
74
+ | 0.3498 | 1.3205 | 1856 | 0.4575 |
75
+ | 0.3525 | 1.6542 | 2320 | 0.4555 |
76
+
77
+ ### Framework versions
78
+
79
+ - Transformers 4.40.2
80
+ - Pytorch 2.3.0+cu121
81
+ - Datasets 2.19.1
82
+ - Tokenizers 0.19.1
83
+
84
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
85
+ <details><summary>See axolotl config</summary>
86
+
87
+ axolotl version: `0.4.0`
88
+ ```yaml
89
+
90
+ base_model: meta-llama/Meta-Llama-3-8B
91
+ model_type: LlamaForCausalLM
92
+ tokenizer_type: AutoTokenizer
93
+
94
+ load_in_8bit: false
95
+ load_in_4bit: false
96
+ strict: false
97
+
98
+ datasets:
99
+ - path: Magpie-Align/Magpie-Pro-MT-300K-v0.1
100
+ type: sharegpt
101
+ conversation: llama3
102
+ dataset_prepared_path: last_run_prepared
103
+ val_set_size: 0.001
104
+ output_dir: ./out_Llama-3-8B-Magpie-Pro-300K-MT
105
+
106
+ sequence_len: 8192
107
+ sample_packing: true
108
+ eval_sample_packing: false
109
+ pad_to_sequence_len: true
110
+
111
+ gradient_accumulation_steps: 8
112
+ micro_batch_size: 1
113
+ num_epochs: 2
114
+ optimizer: paged_adamw_8bit
115
+ lr_scheduler: cosine
116
+ learning_rate: 2e-5
117
+
118
+ train_on_inputs: false
119
+ group_by_length: false
120
+ bf16: auto
121
+ fp16:
122
+ tf32: false
123
+
124
+ gradient_checkpointing: true
125
+ gradient_checkpointing_kwargs:
126
+ use_reentrant: false
127
+ early_stopping_patience:
128
+ resume_from_checkpoint:
129
+ logging_steps: 1
130
+ xformers_attention:
131
+ flash_attention: true
132
+
133
+ warmup_steps: 100
134
+ evals_per_epoch: 3
135
+ eval_table_size:
136
+ saves_per_epoch: 3
137
+ debug:
138
+ deepspeed:
139
+ weight_decay: 0.0
140
+ fsdp:
141
+ fsdp_config:
142
+ special_tokens:
143
+ pad_token: <|end_of_text|>
144
+
145
+ ```
146
+
147
+ </details><be>
148
+
149
+ ## Stage 2: Direct Preference Optimization
150
+
151
+ We use [alignment handbook](https://github.com/huggingface/alignment-handbook) for DPO.
152
 
153
  ### Training hyperparameters
154
 
 
176
  | 0.6376 | 0.6413 | 300 | 0.6178 | -1.3533 | -1.6413 | 0.6748 | 0.2880 | -425.3859 | -390.8818 | -0.6753 | -0.6758 |
177
  | 0.5888 | 0.8550 | 400 | 0.6088 | -1.6321 | -1.9785 | 0.6829 | 0.3464 | -459.1051 | -418.7560 | -0.6440 | -0.6435 |
178
 
179
+ It achieves the following results on the evaluation set:
180
+ - Loss: 0.6084
181
+ - Rewards/chosen: -1.6265
182
+ - Rewards/rejected: -1.9735
183
+ - Rewards/accuracies: 0.6809
184
+ - Rewards/margins: 0.3470
185
+ - Logps/rejected: -458.6070
186
+ - Logps/chosen: -418.2021
187
+ - Logits/rejected: -0.6447
188
+ - Logits/chosen: -0.6439
189
 
190
  ### Framework versions
191
 
 
193
  - Pytorch 2.3.1+cu121
194
  - Datasets 2.20.0
195
  - Tokenizers 0.19.1
196
+
197
+ ## Citation
198
+
199
+ If you find the model, data, or code useful, please cite our paper:
200
+ ```
201
+ @misc{xu2024magpie,
202
+ title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
203
+ author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
204
+ year={2024},
205
+ eprint={2406.08464},
206
+ archivePrefix={arXiv},
207
+ primaryClass={cs.CL}
208
+ }
209
+ ```
210
+
211
+ <details><summary>See alignment handbook config</summary>
212
+
213
+ ```yaml
214
+
215
+ # Model arguments
216
+ model_name_or_path: Magpie-Align/Llama-3-8B-Magpie-Pro-MT-SFT-v0.1
217
+ torch_dtype: null
218
+
219
+ # Data training arguments
220
+ # For definitions, see: src/h4/training/config.py
221
+ dataset_mixer:
222
+ princeton-nlp/llama3-ultrafeedback: 1.0
223
+ dataset_splits:
224
+ - train
225
+ - test
226
+ preprocessing_num_workers: 12
227
+
228
+ # DPOTrainer arguments
229
+ bf16: true
230
+ beta: 0.01
231
+ do_eval: true
232
+ evaluation_strategy: steps
233
+ eval_steps: 100
234
+ gradient_accumulation_steps: 16
235
+ gradient_checkpointing: true
236
+ gradient_checkpointing_kwargs:
237
+ use_reentrant: False
238
+ hub_model_id: Magpie-Align/Llama-3-8B-Magpie-Pro-MT-UltraDPO2
239
+ learning_rate: 1.0e-6
240
+ log_level: info
241
+ logging_steps: 1
242
+ lr_scheduler_type: cosine
243
+ max_length: 2048
244
+ max_prompt_length: 1800
245
+ num_train_epochs: 1
246
+ optim: adamw_torch
247
+ output_dir: data/magpie-pro-mt-ultradpo-1e-6
248
+ per_device_train_batch_size: 2
249
+ per_device_eval_batch_size: 4
250
+ push_to_hub: true
251
+ save_strategy: "steps"
252
+ save_steps: 100
253
+ save_total_limit: 1
254
+ seed: 42
255
+ warmup_ratio: 0.1
256
+
257
+ ```
258
+
259
+ </details><be>
260
+
261
+ ## Citation
262
+
263
+ If you find the model, data, or code useful, please cite our paper:
264
+ ```
265
+ @misc{xu2024magpie,
266
+ title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
267
+ author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
268
+ year={2024},
269
+ eprint={2406.08464},
270
+ archivePrefix={arXiv},
271
+ primaryClass={cs.CL}
272
+ }
273
+ ```