chutesai
/

Llama-3.1-Tulu-3-405B-FP8-dynamic

compressed-tensors

Model card Files Files and versions Community

jondurbin commited on 8 days ago

Commit

4d179b4

·

verified ·

1 Parent(s): 91ae688

Update README.md

Files changed (1) hide show

README.md +58 -0

README.md CHANGED Viewed

@@ -1,5 +1,63 @@
 <img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu3/Tulu3-logo.png" alt="Tulu 3 banner" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 # Llama-3.1-Tulu-3-405B
 Tülu3 is a leading instruction following model family, offering fully open-source data, code, and recipes designed to serve as a comprehensive guide for modern post-training techniques.

 <img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu3/Tulu3-logo.png" alt="Tulu 3 banner" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
+### Dynamic FP8 quantization using llmcompressor
+```python
+import torch
+from transformers import AutoTokenizer
+from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot
+from llmcompressor.transformers.compression.helpers import (  # noqa
+    calculate_offload_device_map,
+    custom_offload_device_map,
+)
+recipe = """
+quant_stage:
+    quant_modifiers:
+        QuantizationModifier:
+            ignore: ["lm_head"]
+            config_groups:
+                group_0:
+                    weights:
+                        num_bits: 8
+                        type: float
+                        strategy: channel
+                        dynamic: false
+                        symmetric: true
+                    input_activations:
+                        num_bits: 8
+                        type: float
+                        strategy: token
+                        dynamic: true
+                        symmetric: true
+                    targets: ["Linear"]
+"""
+model_stub = "allenai/Llama-3.1-Tulu-3-405B"
+model_name = model_stub.split("/")[-1]
+device_map = calculate_offload_device_map(
+    model_stub, reserve_for_hessians=False, num_gpus=8, torch_dtype="auto"
+)
+model = SparseAutoModelForCausalLM.from_pretrained(
+    model_stub, torch_dtype="auto", device_map=device_map
+)
+output_dir = f"./{model_name}-FP8-dynamic"
+oneshot(
+    model=model,
+    recipe=recipe,
+    output_dir=output_dir,
+    save_compressed=True,
+    tokenizer=AutoTokenizer.from_pretrained(model_stub),
+)
+```
 # Llama-3.1-Tulu-3-405B
 Tülu3 is a leading instruction following model family, offering fully open-source data, code, and recipes designed to serve as a comprehensive guide for modern post-training techniques.