abhishekchohan commited on
Commit
f2ec800
·
verified ·
1 Parent(s): 507dce6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen3-8B
4
+ ---
5
+
6
+ # Qwen3 AWQ Quantized Model Collection
7
+
8
+ This repository provides AWQ (Activation-aware Weight Quantization) versions of Qwen3 models, optimized for efficient deployment on consumer hardware while maintaining strong performance.
9
+
10
+ ## Models Available
11
+
12
+ - **Qwen3-32B-AWQ**  -  4-bit quantized, 32B parameters
13
+ - **Qwen3-14B-AWQ**  -  4-bit quantized, 14B parameters
14
+ - **Qwen3-8B-AWQ**  -  4-bit quantized, 8B parameters
15
+ - **Qwen3-4B-AWQ**  -  4-bit quantized, 4B parameters
16
+
17
+
18
+ ## Quantization Details
19
+
20
+ - **Weights:** 4-bit precision (AWQ)
21
+ - **Activations:** 16-bit precision
22
+ - **Benefits:**
23
+ - Up to 3x memory reduction vs FP16
24
+ - Up to 3x inference speedup on supported hardware
25
+ - Minimal loss in model quality
26
+
27
+ ## Features
28
+
29
+ - **Multilingual:** Supports 100+ languages
30
+ - **Long Context:** Native 32K context, extendable with YaRN to 131K tokens
31
+ - **Efficient Inference:** Optimized for NVIDIA GPUs with Tensor Core support
32
+
33
+ ## Usage
34
+
35
+ ### With Hugging Face Transformers
36
+
37
+ ```python
38
+ from transformers import AutoModelForCausalLM, AutoTokenizer
39
+
40
+ model = AutoModelForCausalLM.from_pretrained("abhishekchohan/Qwen3-8B-AWQ", device_map="auto")
41
+ tokenizer = AutoTokenizer.from_pretrained("abhishekchohan/Qwen3-8B-AWQ")
42
+
43
+ messages = [{"role": "user", "content": "Explain quantum computing."}]
44
+ text = tokenizer.apply_chat_template(messages, tokenize=False)
45
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
46
+ ```
47
+
48
+ ### With vLLM
49
+
50
+ ```bash
51
+ vllm serve abhishekchohan/Qwen3-8B-AWQ \
52
+ --chat-template templates/chat_template.jinja \
53
+ --enable-expert-parallel \
54
+ --tensor-parallel-size 4
55
+ ```
56
+
57
+
58
+ ## Citation
59
+
60
+ If you use these models, please cite:
61
+
62
+ ```
63
+ @misc{qwen3,
64
+ title = {Qwen3 Technical Report},
65
+ author = {Qwen Team},
66
+ year = {2025},
67
+ url = {https://github.com/QwenLM/Qwen3}
68
+ }
69
+ ```