abhishekchohan
/

Qwen3-8B-AWQ

4-bit precision

Model card Files Files and versions

abhishekchohan commited on Apr 30

Commit

f2ec800

·

verified ·

1 Parent(s): 507dce6

Create README.md

Files changed (1) hide show

README.md +69 -0

README.md ADDED Viewed

	@@ -0,0 +1,69 @@

+---
+base_model:
+- Qwen/Qwen3-8B
+---
+# Qwen3 AWQ Quantized Model Collection
+This repository provides AWQ (Activation-aware Weight Quantization) versions of Qwen3 models, optimized for efficient deployment on consumer hardware while maintaining strong performance.
+## Models Available
+- **Qwen3-32B-AWQ** &nbsp;-&nbsp; 4-bit quantized, 32B parameters
+- **Qwen3-14B-AWQ** &nbsp;-&nbsp; 4-bit quantized, 14B parameters
+- **Qwen3-8B-AWQ** &nbsp;-&nbsp; 4-bit quantized, 8B parameters
+- **Qwen3-4B-AWQ** &nbsp;-&nbsp; 4-bit quantized, 4B parameters
+## Quantization Details
+- **Weights:** 4-bit precision (AWQ)
+- **Activations:** 16-bit precision
+- **Benefits:**
+  - Up to 3x memory reduction vs FP16
+  - Up to 3x inference speedup on supported hardware
+  - Minimal loss in model quality
+## Features
+- **Multilingual:** Supports 100+ languages
+- **Long Context:** Native 32K context, extendable with YaRN to 131K tokens
+- **Efficient Inference:** Optimized for NVIDIA GPUs with Tensor Core support
+## Usage
+### With Hugging Face Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("abhishekchohan/Qwen3-8B-AWQ", device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained("abhishekchohan/Qwen3-8B-AWQ")
+messages = [{"role": "user", "content": "Explain quantum computing."}]
+text = tokenizer.apply_chat_template(messages, tokenize=False)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+```
+### With vLLM
+```bash
+vllm serve abhishekchohan/Qwen3-8B-AWQ \
+    --chat-template templates/chat_template.jinja \
+    --enable-expert-parallel \
+    --tensor-parallel-size 4
+```
+## Citation
+If you use these models, please cite:
+```
+@misc{qwen3,
+    title = {Qwen3 Technical Report},
+    author = {Qwen Team},
+    year = {2025},
+    url = {https://github.com/QwenLM/Qwen3}
+}
+```