Transformers
GGUF
alignment-handbook
trl
dpo
Generated from Trainer
Inference Endpoints
conversational
aashish1904 commited on
Commit
9ca05ae
·
verified ·
1 Parent(s): 5ffe76b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +301 -0
README.md ADDED
@@ -0,0 +1,301 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ library_name: transformers
5
+ license: other
6
+ base_model: Magpie-Align/MagpieLM-4B-SFT-v0.1
7
+ tags:
8
+ - alignment-handbook
9
+ - trl
10
+ - dpo
11
+ - generated_from_trainer
12
+ datasets:
13
+ - Magpie-Align/MagpieLM-SFT-Data-v0.1
14
+ - Magpie-Align/MagpieLM-DPO-Data-v0.1
15
+ model-index:
16
+ - name: MagpieLM-4B-Chat-v0.1
17
+ results: []
18
+
19
+ ---
20
+
21
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
22
+
23
+
24
+ # QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF
25
+ This is quantized version of [Magpie-Align/MagpieLM-4B-Chat-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-4B-Chat-v0.1) created using llama.cpp
26
+
27
+ # Original Model Card
28
+
29
+
30
+ ![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png)
31
+
32
+ # 🐦 MagpieLM-4B-Chat-v0.1
33
+
34
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://api.wandb.ai/links/uw-nsl/ilv83ciw)
35
+
36
+ ## 🧐 About This Model
37
+
38
+ *Model full name: Llama3.1-MagpieLM-4B-Chat-v0.1*
39
+
40
+ This model is an aligned version of [Llama-3.1-Minitron-4B-Width](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct and Qwen-2-7B-Instruct.
41
+
42
+ We apply the following standard alignment pipeline with two carefully crafted synthetic datasets. Feel free to use these datasets and reproduce our model, or make your own friendly chatbots :)
43
+
44
+ We first perform SFT using [Magpie-Align/MagpieLM-SFT-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-SFT-Data-v0.1).
45
+ * **SFT Model Checkpoint:** [Magpie-Align/MagpieLM-4B-SFT-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-4B-SFT-v0.1)
46
+
47
+ We then perform DPO on the [Magpie-Align/MagpieLM-DPO-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-DPO-Data-v0.1) dataset.
48
+
49
+ [*See more powerful 8B version here!*](https://huggingface.co/Magpie-Align/MagpieLM-8B-Chat-v0.1)
50
+
51
+ ## 🔥 Benchmark Performance
52
+
53
+ Greedy Decoding
54
+
55
+ - **Alpaca Eval 2: 40.99 (LC), 45.19 (WR)**
56
+ - **Arena Hard: 24.6**
57
+ - **WildBench WB Score (v2.0625): 32.37**
58
+
59
+ **Benchmark Performance Compare to Other SOTA SLMs**
60
+
61
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/cNigvzqznKWRy1YfktZ6J.jpeg)
62
+
63
+ ## 👀 Other Information
64
+
65
+ **License**: Please follow [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
66
+
67
+ **Conversation Template**: Please use the **Llama 3 chat template** for the best performance.
68
+
69
+ **Limitations**: This model primarily understands and generates content in English. Its outputs may contain factual errors, logical inconsistencies, or reflect biases present in the training data. While the model aims to improve instruction-following and helpfulness, it isn't specifically designed for complex reasoning tasks, potentially leading to suboptimal performance in these areas. Additionally, the model may produce unsafe or inappropriate content, as no specific safety training were implemented during the alignment process.
70
+
71
+ ## 🧐 How to use it?
72
+
73
+ [![Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue)](https://huggingface.co/spaces/flydust/MagpieLM-4B)
74
+
75
+ Please update transformers to the latest version by `pip install git+https://github.com/huggingface/transformers`.
76
+
77
+ You can then run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
78
+
79
+ ```python
80
+ import transformers
81
+ import torch
82
+
83
+ model_id = "MagpieLM-4B-Chat-v0.1"
84
+
85
+ pipeline = transformers.pipeline(
86
+ "text-generation",
87
+ model=model_id,
88
+ model_kwargs={"torch_dtype": torch.bfloat16},
89
+ device_map="auto",
90
+ )
91
+
92
+ messages = [
93
+ {"role": "system", "content": "You are Magpie, a friendly AI assistant."},
94
+ {"role": "user", "content": "Who are you?"},
95
+ ]
96
+
97
+ outputs = pipeline(
98
+ messages,
99
+ max_new_tokens=256,
100
+ )
101
+ print(outputs[0]["generated_text"][-1])
102
+ ```
103
+
104
+ ---
105
+ # Alignment Pipeline
106
+
107
+ The detailed alignment pipeline is as follows.
108
+
109
+ ## Stage 1: Supervised Fine-tuning
110
+
111
+ We use [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for SFT. Please refer to the model card of [SFT checkpoint](https://huggingface.co/Magpie-Align/MagpieLM-4B-SFT-v0.1) and below for detailed configurations.
112
+
113
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
114
+ <details><summary>See axolotl config</summary>
115
+
116
+ axolotl version: `0.4.1`
117
+ ```yaml
118
+ base_model: nvidia/Llama-3.1-Minitron-4B-Width-Base
119
+ model_type: AutoModelForCausalLM
120
+ tokenizer_type: AutoTokenizer
121
+ chat_template: llama3
122
+
123
+ load_in_8bit: false
124
+ load_in_4bit: false
125
+ strict: false
126
+
127
+ datasets:
128
+ - path: Magpie-Align/MagpieLM-SFT-Data-v0.1
129
+ type: sharegpt
130
+ conversation: llama3
131
+ dataset_prepared_path: last_run_prepared
132
+ val_set_size: 0.001
133
+ output_dir: axolotl_out/MagpieLM-4B-SFT-v0.1
134
+
135
+ sequence_len: 8192
136
+ sample_packing: true
137
+ eval_sample_packing: false
138
+ pad_to_sequence_len: true
139
+
140
+ wandb_project: SynDa
141
+ wandb_entity:
142
+ wandb_watch:
143
+ wandb_name: Llama3.1-MagpieLM-4B-SFT-v0.1
144
+ wandb_log_model:
145
+ hub_model_id: Magpie-Align/MagpieLM-4B-SFT-v0.1
146
+
147
+ gradient_accumulation_steps: 32
148
+ micro_batch_size: 1
149
+ num_epochs: 2
150
+ optimizer: paged_adamw_8bit
151
+ lr_scheduler: cosine
152
+ learning_rate: 2e-5
153
+
154
+ train_on_inputs: false
155
+ group_by_length: false
156
+ bf16: true
157
+ fp16:
158
+ tf32: false
159
+
160
+ gradient_checkpointing: true
161
+ gradient_checkpointing_kwargs:
162
+ use_reentrant: false
163
+ early_stopping_patience:
164
+ resume_from_checkpoint:
165
+ logging_steps: 1
166
+ xformers_attention:
167
+ flash_attention: true
168
+
169
+ warmup_ratio: 0.1
170
+ evals_per_epoch: 5
171
+ eval_table_size:
172
+ saves_per_epoch: 1
173
+ debug:
174
+ deepspeed:
175
+ weight_decay: 0.0
176
+ fsdp:
177
+ fsdp_config:
178
+ special_tokens:
179
+ pad_token: <|end_of_text|>
180
+
181
+ ```
182
+ </details><br>
183
+
184
+ ## Stage 2: Direct Preference Optimization
185
+
186
+ We use [alignment handbook](https://github.com/huggingface/alignment-handbook) for DPO.
187
+
188
+ ### Training hyperparameters
189
+
190
+ The following hyperparameters were used during training:
191
+ - learning_rate: 1.5e-07
192
+ - train_batch_size: 2
193
+ - eval_batch_size: 4
194
+ - seed: 42
195
+ - distributed_type: multi-GPU
196
+ - num_devices: 4
197
+ - gradient_accumulation_steps: 16
198
+ - total_train_batch_size: 128
199
+ - total_eval_batch_size: 16
200
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
201
+ - lr_scheduler_type: cosine
202
+ - lr_scheduler_warmup_ratio: 0.1
203
+ - num_epochs: 1
204
+
205
+ ### Training results
206
+
207
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
208
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
209
+ | 0.6911 | 0.0653 | 100 | 0.6912 | -0.0026 | -0.0066 | 0.5640 | 0.0041 | -502.9037 | -510.6042 | -1.7834 | -1.7781 |
210
+ | 0.6703 | 0.1306 | 200 | 0.6713 | -0.1429 | -0.1981 | 0.6380 | 0.0552 | -522.0521 | -524.6394 | -1.7686 | -1.7593 |
211
+ | 0.6306 | 0.1959 | 300 | 0.6347 | -0.6439 | -0.8210 | 0.6840 | 0.1770 | -584.3356 | -574.7375 | -1.7536 | -1.7436 |
212
+ | 0.5831 | 0.2612 | 400 | 0.5932 | -1.5155 | -1.8774 | 0.7070 | 0.3619 | -689.9788 | -661.8920 | -1.6963 | -1.6877 |
213
+ | 0.5447 | 0.3266 | 500 | 0.5645 | -2.1858 | -2.7052 | 0.7110 | 0.5195 | -772.7636 | -728.9221 | -1.6249 | -1.6207 |
214
+ | 0.5896 | 0.3919 | 600 | 0.5453 | -2.3771 | -2.9747 | 0.7180 | 0.5976 | -799.7122 | -748.0584 | -1.5836 | -1.5847 |
215
+ | 0.5342 | 0.4572 | 700 | 0.5305 | -2.6231 | -3.3063 | 0.7350 | 0.6832 | -832.8744 | -772.6592 | -1.5454 | -1.5524 |
216
+ | 0.511 | 0.5225 | 800 | 0.5177 | -3.0517 | -3.8393 | 0.7400 | 0.7876 | -886.1714 | -815.5145 | -1.5160 | -1.5273 |
217
+ | 0.5007 | 0.5878 | 900 | 0.5088 | -3.0925 | -3.9197 | 0.7540 | 0.8273 | -894.2120 | -819.5908 | -1.5007 | -1.5144 |
218
+ | 0.485 | 0.6531 | 1000 | 0.5033 | -3.1305 | -3.9863 | 0.7630 | 0.8558 | -900.8680 | -823.3940 | -1.4834 | -1.4997 |
219
+ | 0.4307 | 0.7184 | 1100 | 0.4989 | -3.1387 | -4.0097 | 0.7610 | 0.8710 | -903.2113 | -824.2159 | -1.4728 | -1.4911 |
220
+ | 0.5403 | 0.7837 | 1200 | 0.4964 | -3.3418 | -4.2574 | 0.7620 | 0.9156 | -927.9747 | -844.5242 | -1.4641 | -1.4822 |
221
+ | 0.5182 | 0.8490 | 1300 | 0.4952 | -3.3255 | -4.2430 | 0.7600 | 0.9175 | -926.5396 | -842.8945 | -1.4601 | -1.4788 |
222
+ | 0.5165 | 0.9144 | 1400 | 0.4943 | -3.3308 | -4.2525 | 0.7600 | 0.9217 | -927.4913 | -843.4282 | -1.4610 | -1.4799 |
223
+ | 0.5192 | 0.9797 | 1500 | 0.4942 | -3.3377 | -4.2603 | 0.7620 | 0.9226 | -928.2655 | -844.1144 | -1.4591 | -1.4783 |
224
+
225
+
226
+ ### Framework versions
227
+
228
+ - Transformers 4.45.0.dev0
229
+ - Pytorch 2.3.1+cu121
230
+ - Datasets 2.20.0
231
+ - Tokenizers 0.19.1
232
+
233
+ <details><summary>See alignment handbook configs</summary>
234
+
235
+ ```yaml
236
+ # Customized Configs
237
+ model_name_or_path: Magpie-Align/MagpieLM-4B-SFT-v0.1
238
+ hub_model_id: Magpie-Align/MagpieLM-4B-Chat-v0.1
239
+ output_dir: alignment_handbook_out/MagpieLM-4B-Chat-v0.1
240
+ run_name: MagpieLM-4B-Chat-v0.1
241
+
242
+ dataset_mixer:
243
+ Magpie-Align/MagpieLM-DPO-Data-v0.1: 1.0
244
+ dataset_splits:
245
+ - train
246
+ - test
247
+ preprocessing_num_workers: 24
248
+
249
+ # DPOTrainer arguments
250
+ bf16: true
251
+ beta: 0.01
252
+ learning_rate: 1.5e-7
253
+ gradient_accumulation_steps: 16
254
+ per_device_train_batch_size: 2
255
+ per_device_eval_batch_size: 4
256
+ num_train_epochs: 1
257
+ max_length: 2048
258
+ max_prompt_length: 1800
259
+ warmup_ratio: 0.1
260
+ logging_steps: 1
261
+ lr_scheduler_type: cosine
262
+ optim: adamw_torch
263
+
264
+ torch_dtype: null
265
+ # use_flash_attention_2: true
266
+ do_eval: true
267
+ evaluation_strategy: steps
268
+ eval_steps: 100
269
+ gradient_checkpointing: true
270
+ gradient_checkpointing_kwargs:
271
+ use_reentrant: False
272
+ log_level: info
273
+ push_to_hub: true
274
+ save_total_limit: 0
275
+ seed: 42
276
+ report_to:
277
+ - wandb
278
+ ```
279
+ </details><be>
280
+
281
+ ## 📚 Citation
282
+
283
+ If you find the model, data, or code useful, please cite:
284
+ ```
285
+ @article{xu2024magpie,
286
+ title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
287
+ author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
288
+ year={2024},
289
+ eprint={2406.08464},
290
+ archivePrefix={arXiv},
291
+ primaryClass={cs.CL}
292
+ }
293
+ ```
294
+
295
+ **Contact**
296
+
297
+ Questions? Contact:
298
+ - [Zhangchen Xu](https://zhangchenxu.com/) [zxu9 at uw dot edu], and
299
+ - [Bill Yuchen Lin](https://yuchenlin.xyz/) [yuchenlin1995 at gmail dot com]
300
+
301
+