pkumc HandH1998 commited on
Commit
ce60f3d
·
verified ·
1 Parent(s): c6bab3f

Update README.md (#2)

Browse files

- Update README.md (95642c7919dbf9b4e500cd508d311e3909306f6c)


Co-authored-by: HandH1998 <[email protected]>

Files changed (1) hide show
  1. README.md +34 -0
README.md CHANGED
@@ -2,6 +2,40 @@
2
  license: mit
3
  library_name: transformers
4
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  # DeepSeek-R1
6
  <!-- markdownlint-disable first-line-h1 -->
7
  <!-- markdownlint-disable html -->
 
2
  license: mit
3
  library_name: transformers
4
  ---
5
+
6
+ # Channel-wise INT8 DeepSeek-R1
7
+
8
+ The INT8 data type is both friendly and efficient for most hardware platforms.
9
+
10
+ **We provide a channel-wise INT8 weight for DeepSeek-R1.**
11
+
12
+ In benchmarking, we observe **no accuracy loss** and up to **50\%** performance enhancement.
13
+
14
+ [SGLang](https://github.com/sgl-project/sglang/tree/main) will soon support the channel-wise INT8 quantization operation once our [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3888) is merged.
15
+
16
+ ## 1. Benchmarking Result (detailed in [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3888)):
17
+ | Model | Config | Accuracy (GSM8K) | Accuracy (MMLU) | Output Throughput(qps=128) |
18
+ |--------|--------|-------------------|----------------|------------------------------|
19
+ | BF16 R1 | A100\*32 | 95.5 | 87.1 | 3342.29 |
20
+ | INT8 R1 | (A100\*16)x2 | **95.6** | **87.2** | **5035.82 (+50%)** |
21
+
22
+ ## 2. Quantization Process
23
+
24
+ We apply INT8 quantization to the BF16 checkpoints.
25
+
26
+ The quantization scales are determined by dividing the channnel-wise maximum of element values by the INT8 type maximum.
27
+
28
+ To generate this weight, run the provided script in the ``./inference`` directory:
29
+
30
+ ``
31
+ python3 bf16_cast_channel_int8.py --input-bf16-hf-path /path/to/bf16-weights/ --output-int8-hf-path /path/to/save-int8-weight/
32
+ ``
33
+ ## 3. Trouble Shooting
34
+ Before inference, you should confirm that there is no attribute "quantization_config" in `config.json`.
35
+
36
+ ---
37
+
38
+
39
  # DeepSeek-R1
40
  <!-- markdownlint-disable first-line-h1 -->
41
  <!-- markdownlint-disable html -->