QuantFactory
/

MagpieLM-4B-Chat-v0.1-GGUF

+---
+library_name: transformers
+license: other
+base_model: Magpie-Align/MagpieLM-4B-SFT-v0.1
+tags:
+- alignment-handbook
+- trl
+- dpo
+- generated_from_trainer
+datasets:
+- Magpie-Align/MagpieLM-SFT-Data-v0.1
+- Magpie-Align/MagpieLM-DPO-Data-v0.1
+model-index:
+- name: MagpieLM-4B-Chat-v0.1
+  results: []
+---
+[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
+# QuantFactory/MagpieLM-4B-Chat-v0.1-GGUF
+This is quantized version of [Magpie-Align/MagpieLM-4B-Chat-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-4B-Chat-v0.1) created using llama.cpp
+# Original Model Card
+![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png)
+# 🐦 MagpieLM-4B-Chat-v0.1
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://api.wandb.ai/links/uw-nsl/ilv83ciw)
+## 🧐 About This Model
+*Model full name: Llama3.1-MagpieLM-4B-Chat-v0.1*
+This model is an aligned version of [Llama-3.1-Minitron-4B-Width](https://huggingface.co/nvidia/Llama-3.1-Minitron-4B-Width-Base), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct and Qwen-2-7B-Instruct.
+We apply the following standard alignment pipeline with two carefully crafted synthetic datasets. Feel free to use these datasets and reproduce our model, or make your own friendly chatbots :)
+We first perform SFT using [Magpie-Align/MagpieLM-SFT-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-SFT-Data-v0.1).
+* **SFT Model Checkpoint:** [Magpie-Align/MagpieLM-4B-SFT-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-4B-SFT-v0.1)
+We then perform DPO on the [Magpie-Align/MagpieLM-DPO-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-DPO-Data-v0.1) dataset.
+[*See more powerful 8B version here!*](https://huggingface.co/Magpie-Align/MagpieLM-8B-Chat-v0.1)
+## 🔥 Benchmark Performance
+Greedy Decoding
+- **Alpaca Eval 2: 40.99 (LC), 45.19 (WR)**
+- **Arena Hard: 24.6**
+- **WildBench WB Score (v2.0625): 32.37**
+**Benchmark Performance Compare to Other SOTA SLMs**
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/cNigvzqznKWRy1YfktZ6J.jpeg)
+## 👀 Other Information
+**License**: Please follow [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
+**Conversation Template**: Please use the **Llama 3 chat template** for the best performance.
+**Limitations**: This model primarily understands and generates content in English. Its outputs may contain factual errors, logical inconsistencies, or reflect biases present in the training data. While the model aims to improve instruction-following and helpfulness, it isn't specifically designed for complex reasoning tasks, potentially leading to suboptimal performance in these areas. Additionally, the model may produce unsafe or inappropriate content, as no specific safety training were implemented during the alignment process.
+## 🧐 How to use it?
+[![Spaces](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue)](https://huggingface.co/spaces/flydust/MagpieLM-4B)
+Please update transformers to the latest version by `pip install git+https://github.com/huggingface/transformers`.
+You can then run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
+```python
+import transformers
+import torch
+model_id = "MagpieLM-4B-Chat-v0.1"
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model_id,
+    model_kwargs={"torch_dtype": torch.bfloat16},
+    device_map="auto",
+)
+messages = [
+    {"role": "system", "content": "You are Magpie, a friendly AI assistant."},
+    {"role": "user", "content": "Who are you?"},
+]
+outputs = pipeline(
+    messages,
+    max_new_tokens=256,
+)
+print(outputs[0]["generated_text"][-1])
+```
+---
+# Alignment Pipeline
+The detailed alignment pipeline is as follows.
+## Stage 1: Supervised Fine-tuning
+We use [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for SFT. Please refer to the model card of [SFT checkpoint](https://huggingface.co/Magpie-Align/MagpieLM-4B-SFT-v0.1) and below for detailed configurations.
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.4.1`
+```yaml
+base_model: nvidia/Llama-3.1-Minitron-4B-Width-Base
+model_type: AutoModelForCausalLM
+tokenizer_type: AutoTokenizer
+chat_template: llama3
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+datasets:
+  - path: Magpie-Align/MagpieLM-SFT-Data-v0.1
+    type: sharegpt
+    conversation: llama3
+dataset_prepared_path: last_run_prepared
+val_set_size: 0.001
+output_dir: axolotl_out/MagpieLM-4B-SFT-v0.1
+sequence_len: 8192
+sample_packing: true
+eval_sample_packing: false
+pad_to_sequence_len: true
+wandb_project: SynDa
+wandb_entity:
+wandb_watch:
+wandb_name: Llama3.1-MagpieLM-4B-SFT-v0.1
+wandb_log_model:
+hub_model_id: Magpie-Align/MagpieLM-4B-SFT-v0.1
+gradient_accumulation_steps: 32
+micro_batch_size: 1
+num_epochs: 2
+optimizer: paged_adamw_8bit
+lr_scheduler: cosine
+learning_rate: 2e-5
+train_on_inputs: false
+group_by_length: false
+bf16: true
+fp16:
+tf32: false
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+early_stopping_patience:
+resume_from_checkpoint:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+warmup_ratio: 0.1
+evals_per_epoch: 5
+eval_table_size:
+saves_per_epoch: 1
+debug:
+deepspeed:
+weight_decay: 0.0
+fsdp:
+fsdp_config:
+special_tokens:
+  pad_token: <|end_of_text|>
+```
+</details><br>
+## Stage 2: Direct Preference Optimization
+We use [alignment handbook](https://github.com/huggingface/alignment-handbook) for DPO.
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1.5e-07
+- train_batch_size: 2
+- eval_batch_size: 4
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 128
+- total_eval_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6911        | 0.0653 | 100  | 0.6912          | -0.0026        | -0.0066          | 0.5640             | 0.0041          | -502.9037      | -510.6042    | -1.7834         | -1.7781       |
+| 0.6703        | 0.1306 | 200  | 0.6713          | -0.1429        | -0.1981          | 0.6380             | 0.0552          | -522.0521      | -524.6394    | -1.7686         | -1.7593       |
+| 0.6306        | 0.1959 | 300  | 0.6347          | -0.6439        | -0.8210          | 0.6840             | 0.1770          | -584.3356      | -574.7375    | -1.7536         | -1.7436       |
+| 0.5831        | 0.2612 | 400  | 0.5932          | -1.5155        | -1.8774          | 0.7070             | 0.3619          | -689.9788      | -661.8920    | -1.6963         | -1.6877       |
+| 0.5447        | 0.3266 | 500  | 0.5645          | -2.1858        | -2.7052          | 0.7110             | 0.5195          | -772.7636      | -728.9221    | -1.6249         | -1.6207       |
+| 0.5896        | 0.3919 | 600  | 0.5453          | -2.3771        | -2.9747          | 0.7180             | 0.5976          | -799.7122      | -748.0584    | -1.5836         | -1.5847       |
+| 0.5342        | 0.4572 | 700  | 0.5305          | -2.6231        | -3.3063          | 0.7350             | 0.6832          | -832.8744      | -772.6592    | -1.5454         | -1.5524       |
+| 0.511         | 0.5225 | 800  | 0.5177          | -3.0517        | -3.8393          | 0.7400             | 0.7876          | -886.1714      | -815.5145    | -1.5160         | -1.5273       |
+| 0.5007        | 0.5878 | 900  | 0.5088          | -3.0925        | -3.9197          | 0.7540             | 0.8273          | -894.2120      | -819.5908    | -1.5007         | -1.5144       |
+| 0.485         | 0.6531 | 1000 | 0.5033          | -3.1305        | -3.9863          | 0.7630             | 0.8558          | -900.8680      | -823.3940    | -1.4834         | -1.4997       |
+| 0.4307        | 0.7184 | 1100 | 0.4989          | -3.1387        | -4.0097          | 0.7610             | 0.8710          | -903.2113      | -824.2159    | -1.4728         | -1.4911       |
+| 0.5403        | 0.7837 | 1200 | 0.4964          | -3.3418        | -4.2574          | 0.7620             | 0.9156          | -927.9747      | -844.5242    | -1.4641         | -1.4822       |
+| 0.5182        | 0.8490 | 1300 | 0.4952          | -3.3255        | -4.2430          | 0.7600             | 0.9175          | -926.5396      | -842.8945    | -1.4601         | -1.4788       |
+| 0.5165        | 0.9144 | 1400 | 0.4943          | -3.3308        | -4.2525          | 0.7600             | 0.9217          | -927.4913      | -843.4282    | -1.4610         | -1.4799       |
+| 0.5192        | 0.9797 | 1500 | 0.4942          | -3.3377        | -4.2603          | 0.7620             | 0.9226          | -928.2655      | -844.1144    | -1.4591         | -1.4783       |
+### Framework versions
+- Transformers 4.45.0.dev0
+- Pytorch 2.3.1+cu121
+- Datasets 2.20.0
+- Tokenizers 0.19.1
+<details><summary>See alignment handbook configs</summary>
+```yaml
+# Customized Configs
+model_name_or_path: Magpie-Align/MagpieLM-4B-SFT-v0.1
+hub_model_id: Magpie-Align/MagpieLM-4B-Chat-v0.1
+output_dir: alignment_handbook_out/MagpieLM-4B-Chat-v0.1
+run_name: MagpieLM-4B-Chat-v0.1
+dataset_mixer:
+   Magpie-Align/MagpieLM-DPO-Data-v0.1: 1.0
+dataset_splits:
+- train
+- test
+preprocessing_num_workers: 24
+# DPOTrainer arguments
+bf16: true
+beta: 0.01
+learning_rate: 1.5e-7
+gradient_accumulation_steps: 16
+per_device_train_batch_size: 2
+per_device_eval_batch_size: 4
+num_train_epochs: 1
+max_length: 2048
+max_prompt_length: 1800
+warmup_ratio: 0.1
+logging_steps: 1
+lr_scheduler_type: cosine
+optim: adamw_torch
+torch_dtype: null
+# use_flash_attention_2: true
+do_eval: true
+evaluation_strategy: steps
+eval_steps: 100
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: False
+log_level: info
+push_to_hub: true
+save_total_limit: 0
+seed: 42
+report_to:
+- wandb
+```
+</details><be>
+## 📚 Citation
+If you find the model, data, or code useful, please cite:
+```
+@article{xu2024magpie,
+	title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
+	author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
+	year={2024},
+	eprint={2406.08464},
+	archivePrefix={arXiv},
+	primaryClass={cs.CL}
+}
+```
+**Contact**
+Questions? Contact:
+- [Zhangchen Xu](https://zhangchenxu.com/) [zxu9 at uw dot edu], and
+- [Bill Yuchen Lin](https://yuchenlin.xyz/) [yuchenlin1995 at gmail dot com]