liusong123 commited on
Commit
5f9e743
Β·
verified Β·
1 Parent(s): 675aeb7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -130
README.md CHANGED
@@ -1,7 +1,14 @@
 
 
 
 
 
 
 
1
  <p align="center">
2
  <picture>
3
- <source media="(prefers-color-scheme: dark)" srcset="./docs/source/assets/logos/angelslim_logo_light.png">
4
- <img alt="AngelSlim" src="./docs/source/assets/logos/angelslim_logo.png" width=55%>
5
  </picture>
6
  </p>
7
 
@@ -10,10 +17,11 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
10
  </h3>
11
 
12
  <p align="center">
13
- πŸ“– <a href="https://angelslim.readthedocs.io/">Documentation</a>&nbsp&nbsp | &nbsp&nbspπŸ€— <a href="https://huggingface.co/AngelSlim">Hugging Face</a>&nbsp&nbsp | &nbsp&nbspπŸ€– <a href="https://modelscope.cn/organization/AngelSlim">ModelScope</a>&nbsp&nbsp | &nbsp&nbspπŸ’¬ <a href="./docs/source/assets/angel_slim_wechat.png">WeChat</a> | &nbsp&nbsp🫨 <a href="https://discord.com/invite/dHVNeuNdFt">Discord</a>
14
  <br>
15
  </p>
16
 
 
17
  ## Table of Contents
18
 
19
  - [Latest Updates](#latest-updates)
@@ -29,11 +37,14 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
29
  - [Technical Discussion](#technical-discussion)
30
 
31
  ## πŸ“£Latest Updates
32
- - [25/08/06] We now support quantization for `Hunyuan 0.5B/1.8B/4B/7B` and multimodal model `Qwen2.5VL 3B/7B/32B/72B`, including `FP8/INT4` algorithms, and quantization for `DeepSeek-R1/V3` and `Kimi-K2`, including `FP8-Static` and `W4A8-FP8` algorithms. We also opensource `Hunyuan 1.8B/4B/7B` series Eagle3 model weight.
33
- - [25/07/04] We now support quantization for `Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen` and other models, including `INT8/FP8/INT4` algorithms. We also opensource `Qwen3` series Eagle3 model weight.
 
34
 
35
  Coming soon:
36
- - [ ] Diffusion model compression support.
 
 
37
  - [ ] Release of new algorithm for speculative sampling.
38
 
39
  ## 🌟Key Features
@@ -49,7 +60,7 @@ Currently supports the following LLMs, including Hunyuan-Dense, Hunyuan-MoE, Qwe
49
 
50
  | Model | FP8-Dynamic | FP8-Static | INT8-Dynamic | INT4-GPTQ | INT4-AWQ |
51
  | --------------------------------------------------------------------------------------------------------------------------- | ----------- | ---------- | ------------ | --------- | -------- |
52
- | [Hunyuan-Dense](https://huggingface.co/collections/tencent/hunyuan-dense-model-6890632cda26b19119c9c5e7) | βœ… | βœ… | βœ… | βœ… | βœ… |
53
  | [Hunyuan-MoE](https://huggingface.co/collections/tencent/hunyuan-a13b-685ec38e5b46321e3ea7c4be) | βœ… | βœ… | βœ… | βœ… | βœ… |
54
  | [Qwen3-Dense](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | βœ… | βœ… | βœ… | βœ… | βœ… |
55
  | [Qwen3-MoE](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | βœ… | βœ… | βœ… | βœ… | βœ… |
@@ -58,8 +69,6 @@ Currently supports the following LLMs, including Hunyuan-Dense, Hunyuan-MoE, Qwe
58
  | [QwQ](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | βœ… | βœ… | βœ… | βœ… | βœ… |
59
 
60
  ### Speculative Decoding
61
-
62
- #### Eagle3
63
  The Eagle3 weights for the Qwen3 series model are now available.
64
 
65
  | Qwen3 Models | Hunyuan Models |
@@ -121,21 +130,9 @@ After installing `AngelSlim`, you can quickly start by running the following scr
121
 
122
  For more details, please refer to the [Quick Start Documentation](https://angelslim.readthedocs.io/zh-cn/latest/getting_started/quickstrat.html).
123
 
124
- ### Deployment and Testing
125
-
126
- ### 1. Offline Inference
127
-
128
- If you need to load a quantized model via `transformers`, please set the `deploy_backend: huggingface` in the `global` configuration before quantizing the model, or manually modify the `ignored_layers` field in the `config.json` file located in the quantized model output directory to `ignore`.
129
-
130
- To test offline inference with a quantized model loaded via `transformers`, run the following command:
131
-
132
- ```shell
133
- python deploy/offline.py $MODEL_PATH
134
- ```
135
-
136
- Where `MODEL_PATH` is the path to the quantized model output.
137
 
138
- #### 2. API Service Deployment
139
 
140
  After specifying the quantized model path `MODEL_PATH`, you can deploy an OpenAI-compatible API service using the following LLMs inference frameworks:
141
 
@@ -157,7 +154,7 @@ Use the following script to launch a [SGLang](https://github.com/sgl-project/sgl
157
  bash deploy/run_sglang.sh $MODEL_PATH
158
  ```
159
 
160
- #### 3. Service Invocation
161
 
162
  Invoke requests via [OpenAI's API format](https://platform.openai.com/docs/api-reference/introduction):
163
 
@@ -165,7 +162,7 @@ Invoke requests via [OpenAI's API format](https://platform.openai.com/docs/api-r
165
  bash deploy/openai.sh $MODEL_PATH
166
  ```
167
 
168
- #### 4. Performance Evaluation
169
 
170
  Evaluate the performance of quantized model using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness), recommended version`lm-eval>=0.4.8`:
171
 
@@ -184,48 +181,14 @@ The performance test results for selected models are shown below. For the comple
184
 
185
  #### Hunyuan Series Models
186
 
187
- Benchmark results for the `Hunyuan-Instruct` model with `FP8`, `INT4-AWQ` and `INT4-GPTQ` quantization algorithms on datasets including`OlympiadBench`, `AIME 2024` and `DROP`:
188
 
189
- <table>
190
- <thead>
191
- <tr><th>Model</th><th>Quantization</th><th>OlympiadBench</th><th>AIME 2024</th><th>DROP</th><th>GPQA-Diamond</th></tr>
192
- </thead>
193
- <tbody>
194
- <tr><td rowspan="4">Hunyuan-A13B-Instruct</td>
195
- <td>BF16</td><td>82.7</td><td>87.30</td><td>91.1</td><td>71.2</td></tr>
196
- <tr><td>FP8-Static</td><td>83.0</td><td>86.7</td><td>91.1</td><td>-</td></tr>
197
- <tr><td>Int4-GPTQ</td><td>82.7</td><td>86.7</td><td>91.1</td><td>-</td></tr>
198
- <tr><td>Int4-AWQ</td><td>82.6</td><td>85.6</td><td>91.0</td><td>-</td></tr>
199
- </tbody>
200
- <tbody>
201
- <tr><td rowspan="4">Hunyuan-7B-Instruct</td>
202
- <td>BF16</td> <td>76.5</td><td>81.1</td><td>85.9</td><td>60.1</td></tr>
203
- <tr><td>FP8-Static</td><td>76.6</td><td>80.9</td><td>86.0</td><td>60.1</td></tr>
204
- <tr><td>Int4-GPTQ</td><td>76.2</td><td>81.0</td><td>85.7</td><td>60.0</td></tr>
205
- <tr><td>Int4-AWQ</td><td>76.4</td><td>80.9</td><td>85.9</td><td>60.1</td></tr>
206
- </tbody>
207
- <tbody>
208
- <tr><td rowspan="4">Hunyuan-4B-Instruct</td>
209
- <td>BF16</td> <td>73.1</td><td>78.3</td><td>78.2</td><td>61.1</td></tr>
210
- <tr><td>FP8-Static</td><td>73.1</td><td>76.6</td><td>78.3</td><td>60.2</td></tr>
211
- <tr><td>Int4-GPTQ</td><td>72.9</td><td>-</td><td>78.1</td><td>58.1</td></tr>
212
- <tr><td>Int4-AWQ</td><td>72.8</td><td>-</td><td>78.2</td><td>-</td></tr>
213
- </tbody>
214
- <tbody>
215
- <tr><td rowspan="4">Hunyuan-1.8B-Instruct</td>
216
- <td>BF16</td> <td>63.4</td><td>56.7</td><td>76.7</td><td>47.2</td></tr>
217
- <tr><td>FP8-Static</td><td>62.5</td><td>55.2</td><td>75.1</td><td>47.7</td></tr>
218
- <tr><td>Int4-GPTQ</td><td>60.9</td><td>-</td><td>73.0</td><td>44.4</td></tr>
219
- <tr><td>Int4-AWQ</td><td>61.7</td><td>-</td><td>71.7</td><td>43.6</td></tr>
220
- </tbody>
221
- <tbody>
222
- <tr><td rowspan="4">Hunyuan-0.5B-Instruct</td>
223
- <td>BF16</td> <td>29.6</td><td>17.2</td><td>52.8</td><td>23.3</td></tr>
224
- <tr><td>FP8-Static</td><td>29.6</td><td>17.2</td><td>51.6</td><td>22.5</td></tr>
225
- <tr><td>Int4-GPTQ</td><td>26.8</td><td>-</td><td>50.9</td><td>23.3</td></tr>
226
- <tr><td>Int4-AWQ</td><td>26.3</td><td>-</td><td>48.9</td><td>23.3</td></tr>
227
- </tbody>
228
- </table>
229
 
230
  #### Qwen3 Series Models
231
 
@@ -273,65 +236,6 @@ Benchmark results for Qwen3 series models with `FP8-Static`, `FP8-Dynamic`, `INT
273
  </tbody>
274
  </table>
275
 
276
- #### Qwen2.5VL Series Models
277
-
278
- Benchmark results for Qwen2.5VL series models with `BF16`、`FP8-Static`、`FP8-Dynamic`、`INT4-GPTQ`、`INT4-AWQ` quantization algorithms on datasets including `MMMU_VAL`、`DocVQA_VAL` and `ChartQA_TEST`:
279
-
280
- <table>
281
- <thead>
282
- <tr><th>Model</th><th>Quantization</th><th>MMMU_VAL</th><th>MMLDocVQA_VALU</th><th>ChartQA_TEST</th></tr>
283
- </thead>
284
- <tbody>
285
- <tr><td rowspan="5">Qwen2.5VL-3B</td><td>BF16</td><td>47.11</td><td>78.57</td><td>80.32</td></tr>
286
- <tr><td>FP8-Static</td><td>47.33</td><td>79.34</td><td>79.68</td></tr>
287
- <tr><td>FP8-Dynamic</td><td>45.99</td><td>46.93</td><td>38.29</td></tr>
288
- <tr><td>INT4-GPTQ</td><td>46.56</td><td>77.20</td><td>78.96</td></tr>
289
- <tr><td>INT4-AWQ</td><td>45.78</td><td>-</td><td>79.60</td></tr>
290
- <tr><td rowspan="5">Qwen2.5VL-7B</td><td>BF16</td><td>45.44</td><td>89.71</td><td>84.64</td></tr>
291
- <tr><td>FP8-Static</td><td>47.00</td><td>89.83</td><td>85.92</td></tr>
292
- <tr><td>FP8-Dynamic</td><td>47.22</td><td>89.80</td><td>88.64</td></tr>
293
- <tr><td>INT4-GPTQ</td><td>46.67</td><td>90.45</td><td>-</td></tr>
294
- <tr><td>INT4-AWQ</td><td>45.67</td><td>89.28</td><td>-</td></tr>
295
- <tr><td rowspan="5">Qwen2.5VL-32B</td><td>BF16</td><td>57.00</td><td>90.03</td><td>-</td></tr>
296
- <tr><td>FP8-Static</td><td>57.00</td><td>89.88</td><td>-</td></tr>
297
- <tr><td>FP8-Dynamic</td><td>56.44</td><td>89.88</td><td>-</td></tr>
298
- <tr><td>INT4-GPTQ</td><td>55.22</td><td>89.80 </td><td>-</td></tr>
299
- <tr><td>INT4-AWQ</td><td>55.22</td><td>90.30</td><td>-</td></tr>
300
- <tr><td rowspan="5">Qwen2.5VL-72B</td><td>BF16</td><td>58.78</td><td>94.39</td><td>85.60</td></tr>
301
- <tr><td>FP8-Static</td><td>57.89</td><td>94.41</td><td>85.84</td></tr>
302
- <tr><td>FP8-Dynamic</td><td>58.67</td><td>94.38</td><td>85.60</td></tr>
303
- <tr><td>INT4-GPTQ</td><td>57.56</td><td>94.46</td><td>86.48</td></tr>
304
- <tr><td>INT4-AWQ</td><td>58.78</td><td>94.19</td><td>87.28</td></tr>
305
- </tbody>
306
- </table>
307
-
308
- #### DeepSeek Series Models
309
-
310
- Benchmark results for DeepSeek-R1-0528 series models with `FP8-Block-Wise` and `W4A8-FP8` quantization algorithms on datasets including `GPQA Diamond`、`AIME 2024`、`SimpleQA` and `LiveCodeBench`:
311
-
312
- <table>
313
- <thead>
314
- <tr><th>Model</th><th>Quantization</th><th>GPQA Diamond</th><th>AIME 2024</th><th>SimpleQA</th><th>LiveCodeBench</th></tr>
315
- </thead>
316
- <tbody>
317
- <tr><td rowspan="6">DeepSeek-R1-0528</td><td>FP8-Block-Wise</td><td>78.28</td><td>88.67</td><td>27.8</td><td>77.1</td></tr>
318
- <tr><td>W4A8-FP8</td><td>77.37</td><td>88.67</td><td>26.83</td><td>78.86</td></tr>
319
- </tbody>
320
- </table>
321
-
322
- > **Note**:
323
- > - The above results are based on the average of 5 test runs deployed with TRT-LLM
324
- > - The hyperparameters used during evaluation are as follows:
325
- > ```json
326
- >{
327
- > "top_k": 20,
328
- > "top_p": 0.6,
329
- > "temperature": 0.7,
330
- > "output_seq_len": 32768,
331
- > "max_input_seq_len": 16384
332
- >}
333
- >```
334
-
335
  #### Other Models
336
 
337
  Benchmark results for other models with `FP8-Static`, `FP8-Dynamic`, `INT4-GPTQ`, and `INT4-AWQ` quantization algorithms on datasets including `CEVAL`, `MMLU` and `GSM8K`:
@@ -393,16 +297,16 @@ Benchmark results for Qwen3 series models with `Eagle3` speculative decoding alg
393
  <tr><td rowspan="6"><strong>T=0</strong></td>
394
  <td>Qwen3-1.7B</td><td>2.05x</td><td>2.81</td><td>2.07x</td><td>2.93</td><td>2.11x</td><td>2.98</td><td>1.93x</td><td>2.69</td><td>2.04x</td><td>2.85</td></tr>
395
  <tr> <td>Qwen3-4B</td><td>2.21x</td><td>3.01</td><td>2.36x</td><td>3.24</td><td>2.42x</td><td>3.13</td><td>2.32x</td><td>2.75</td><td>2.33x</td><td>3.03</td></tr>
396
- <tr><td>Qwen3-8B</td><td>2.63x</td><td>3.65</td><td>2.76x</td><td>3.85</td><td>2.82x</td><td>3.90</td><td>2.62x</td><td>3.48</td><td>2.70x</td><td>3.72</td></tr>
397
- <tr><td>Qwen3-14B</td><td>2.23x</td><td>3.30</td><td>2.53x</td><td>3.74</td><td>2.56x</td><td>3.79</td><td>2.16x</td><td>3.13</td><td>2.37x</td><td>3.49</td></tr>
398
  <tr><td>Qwen3-32B</td><td>2.39x</td><td>2.78</td><td>2.37x</td><td>2.81</td><td>2.47x</td><td>2.92</td><td>2.42x</td><td>2.53</td><td>2.41x</td><td>2.76</td></tr>
399
  <tr><td>Qwen3-30B-A3B</td><td>2.84x</td><td>3.63</td><td>2.27x</td><td>3.09</td><td>2.64x</td><td>3.42</td><td>2.83x</td><td>3.56</td><td>2.64x</td><td>3.42</td></tr>
400
  <!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
401
  <tr><td rowspan="6"><strong>T=1</strong></td>
402
  <td>Qwen3-1.7B</td><td>1.74x</td><td>2.53</td><td>1.86x</td><td>2.70</td><td>1.82x</td><td>2.69</td><td>1.72x</td><td>2.46</td><td>1.93x</td><td>2.60</td></tr>
403
  <tr><td>Qwen3-4B</td><td>1.93x</td><td>2.60</td><td>2.00x</td><td>2.84</td><td>2.11x</td><td>2.82</td><td>2.34x</td><td>2.50</td><td>1.75x</td><td>2.69</td></tr>
404
- <tr><td>Qwen3-8B</td><td>1.98x</td><td>2.75</td><td>2.25x</td><td>3.11</td><td>2.31x</td><td>3.15</td><td>2.10x</td><td>2.76</td><td>2.90x</td><td>2.94</td></tr>
405
- <tr><td>Qwen3-14B</td><td>1.71x</td><td>2.61</td><td>1.95x</td><td>2.87</td><td>2.04x</td><td>3.08</td><td>1.68x</td><td>2.55</td><td>2.90x</td><td>2.78</td></tr>
406
  <tr><td>Qwen3-32B</td><td>1.62x</td><td>1.91</td><td>1.71x</td><td>2.05</td><td>1.78x</td><td>2.10</td><td>1.80x</td><td>1.95</td><td>1.62x</td><td>2.00</td></tr>
407
  <tr><td>Qwen3-30B-A3B</td><td>1.91x</td><td>2.46</td><td>2.00x</td><td>2.64</td><td>1.90x</td><td>2.53</td><td>1.80x</td><td>2.32</td><td>1.90x</td><td>2.48</td></tr>
408
  </tbody>
@@ -454,4 +358,4 @@ The code for this project is open-sourced under the [License for AngelSlim](LICE
454
 
455
  ## πŸ’¬ Technical Discussion
456
 
457
- * AngelSlim is continuously iterating and new features will be released soon. If you have any questions or suggestions, please open an issue on [GitHub Issues](https://github.com/Tencent/AngelSlim/issues) or join our [WeChat technical discussion group](./docs/source/assets/angel_slim_wechat.png).
 
1
+ ---
2
+ tags:
3
+ - hunyuan
4
+ - eagle3
5
+ - eagle
6
+ ---
7
+
8
  <p align="center">
9
  <picture>
10
+ <source media="(prefers-color-scheme: dark)" srcset="https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/logos/angelslim_logo_light.png?raw=true">
11
+ <img alt="AngelSlim" src="https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/logos/angelslim_logo.png?raw=true" width=55%>
12
  </picture>
13
  </p>
14
 
 
17
  </h3>
18
 
19
  <p align="center">
20
+ πŸ“– <a href="https://angelslim.readthedocs.io/">Documentation</a>&nbsp&nbsp | &nbsp&nbspπŸ€— <a href="https://huggingface.co/AngelSlim">Hugging Face</a>&nbsp&nbsp | &nbsp&nbspπŸ€– <a href="https://modelscope.cn/organization/AngelSlim">ModelScope</a>&nbsp&nbsp | &nbsp&nbspπŸ’¬ <a href="./docs/source/assets/angel_slim_wechat.png">WeChat</a>
21
  <br>
22
  </p>
23
 
24
+
25
  ## Table of Contents
26
 
27
  - [Latest Updates](#latest-updates)
 
37
  - [Technical Discussion](#technical-discussion)
38
 
39
  ## πŸ“£Latest Updates
40
+
41
+ - [25/07/04] We now support quantization for Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen and other models, including INT8/FP8/INT4 algorithms.
42
+ We also opensource Qwen3-8B`s Eagle3 model weight.
43
 
44
  Coming soon:
45
+
46
+ - [ ] Support W4A8 quantization for DeepSeek-R1.
47
+ - [ ] Support quantization for multimodal models like Qwen-VL.
48
  - [ ] Release of new algorithm for speculative sampling.
49
 
50
  ## 🌟Key Features
 
60
 
61
  | Model | FP8-Dynamic | FP8-Static | INT8-Dynamic | INT4-GPTQ | INT4-AWQ |
62
  | --------------------------------------------------------------------------------------------------------------------------- | ----------- | ---------- | ------------ | --------- | -------- |
63
+ | [Hunyuan-Dense](https://huggingface.co/tencent/Hunyuan-7B-Instruct) | βœ… | βœ… | βœ… | βœ… | βœ… |
64
  | [Hunyuan-MoE](https://huggingface.co/collections/tencent/hunyuan-a13b-685ec38e5b46321e3ea7c4be) | βœ… | βœ… | βœ… | βœ… | βœ… |
65
  | [Qwen3-Dense](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | βœ… | βœ… | βœ… | βœ… | βœ… |
66
  | [Qwen3-MoE](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | βœ… | βœ… | βœ… | βœ… | βœ… |
 
69
  | [QwQ](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | βœ… | βœ… | βœ… | βœ… | βœ… |
70
 
71
  ### Speculative Decoding
 
 
72
  The Eagle3 weights for the Qwen3 series model are now available.
73
 
74
  | Qwen3 Models | Hunyuan Models |
 
130
 
131
  For more details, please refer to the [Quick Start Documentation](https://angelslim.readthedocs.io/zh-cn/latest/getting_started/quickstrat.html).
132
 
133
+ ### πŸ–₯️ Deployment and Testing
 
 
 
 
 
 
 
 
 
 
 
 
134
 
135
+ #### 1. API Service Deployment
136
 
137
  After specifying the quantized model path `MODEL_PATH`, you can deploy an OpenAI-compatible API service using the following LLMs inference frameworks:
138
 
 
154
  bash deploy/run_sglang.sh $MODEL_PATH
155
  ```
156
 
157
+ #### 2. Service Invocation
158
 
159
  Invoke requests via [OpenAI's API format](https://platform.openai.com/docs/api-reference/introduction):
160
 
 
162
  bash deploy/openai.sh $MODEL_PATH
163
  ```
164
 
165
+ #### 3. Performance Evaluation
166
 
167
  Evaluate the performance of quantized model using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness), recommended version`lm-eval>=0.4.8`:
168
 
 
181
 
182
  #### Hunyuan Series Models
183
 
184
+ Benchmark results for the `Hunyuan-A13B-Instruct` model with `FP8` and `INT4-GPTQ` quantization algorithms on datasets including `AIME 2024`, `GSM8K`, `BBH`, and `DROP`:
185
 
186
+ | Bench | Hunyuan-A13B-Instruct | Hunyuan-A13B-Instruct-FP8 | Hunyuan-A13B-Instruct-Int4-GPTQ |
187
+ |:---------:|:---------------------:|:-------------------------:|:-------------------------------:|
188
+ | AIME 2024 | 87.3 | 86.7 | 86.7 |
189
+ | GSM8K | 94.39 | 94.01 | 94.24 |
190
+ | BBH | 89.1 | 88.34 | 87.91 |
191
+ | DROP | 91.1 | 91.1 | 91.05 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
192
 
193
  #### Qwen3 Series Models
194
 
 
236
  </tbody>
237
  </table>
238
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
239
  #### Other Models
240
 
241
  Benchmark results for other models with `FP8-Static`, `FP8-Dynamic`, `INT4-GPTQ`, and `INT4-AWQ` quantization algorithms on datasets including `CEVAL`, `MMLU` and `GSM8K`:
 
297
  <tr><td rowspan="6"><strong>T=0</strong></td>
298
  <td>Qwen3-1.7B</td><td>2.05x</td><td>2.81</td><td>2.07x</td><td>2.93</td><td>2.11x</td><td>2.98</td><td>1.93x</td><td>2.69</td><td>2.04x</td><td>2.85</td></tr>
299
  <tr> <td>Qwen3-4B</td><td>2.21x</td><td>3.01</td><td>2.36x</td><td>3.24</td><td>2.42x</td><td>3.13</td><td>2.32x</td><td>2.75</td><td>2.33x</td><td>3.03</td></tr>
300
+ <tr><td>Qwen3-8B</td><td>2.65x</td><td>3.87</td><td>2.64x</td><td>3.82</td><td>2.86x</td><td>4.10</td><td>2.58x</td><td>3.55</td><td>2.68x</td><td>3.83</td></tr>
301
+ <tr><td>Qwen3-14B</td><td>2.42x</td><td>3.38</td><td>2.57x</td><td>3.58</td><td>2.75x</td><td>3.77</td><td>2.27x</td><td>3.11</td><td>2.50x</td><td>3.46</td></tr>
302
  <tr><td>Qwen3-32B</td><td>2.39x</td><td>2.78</td><td>2.37x</td><td>2.81</td><td>2.47x</td><td>2.92</td><td>2.42x</td><td>2.53</td><td>2.41x</td><td>2.76</td></tr>
303
  <tr><td>Qwen3-30B-A3B</td><td>2.84x</td><td>3.63</td><td>2.27x</td><td>3.09</td><td>2.64x</td><td>3.42</td><td>2.83x</td><td>3.56</td><td>2.64x</td><td>3.42</td></tr>
304
  <!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
305
  <tr><td rowspan="6"><strong>T=1</strong></td>
306
  <td>Qwen3-1.7B</td><td>1.74x</td><td>2.53</td><td>1.86x</td><td>2.70</td><td>1.82x</td><td>2.69</td><td>1.72x</td><td>2.46</td><td>1.93x</td><td>2.60</td></tr>
307
  <tr><td>Qwen3-4B</td><td>1.93x</td><td>2.60</td><td>2.00x</td><td>2.84</td><td>2.11x</td><td>2.82</td><td>2.34x</td><td>2.50</td><td>1.75x</td><td>2.69</td></tr>
308
+ <tr><td>Qwen3-8B</td><td>1.91x</td><td>2.84</td><td>2.07x</td><td>3.05</td><td>2.34x</td><td>3.26</td><td>2.09x</td><td>2.92</td><td>2.10x</td><td>3.02</td></tr>
309
+ <tr><td>Qwen3-14B</td><td>1.81x</td><td>2.58</td><td>1.96x</td><td>2.81</td><td>2.16x</td><td>3.09</td><td>1.76x</td><td>2.49</td><td>1.92x</td><td>2.74</td></tr>
310
  <tr><td>Qwen3-32B</td><td>1.62x</td><td>1.91</td><td>1.71x</td><td>2.05</td><td>1.78x</td><td>2.10</td><td>1.80x</td><td>1.95</td><td>1.62x</td><td>2.00</td></tr>
311
  <tr><td>Qwen3-30B-A3B</td><td>1.91x</td><td>2.46</td><td>2.00x</td><td>2.64</td><td>1.90x</td><td>2.53</td><td>1.80x</td><td>2.32</td><td>1.90x</td><td>2.48</td></tr>
312
  </tbody>
 
358
 
359
  ## πŸ’¬ Technical Discussion
360
 
361
+ * AngelSlim is continuously iterating and new features will be released soon. If you have any questions or suggestions, please open an issue on GitHub or join our [WeChat technical discussion group](https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/angel_slim_wechat.png?raw=true).