bowenbaoamd commited on
Commit
cd6d224
·
verified ·
1 Parent(s): 05ecd4e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -3
README.md CHANGED
@@ -1,3 +1,53 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - deepseek-ai/DeepSeek-R1-0528
5
+ ---
6
+
7
+
8
+ # Model Overview
9
+
10
+ - **Model Architecture:** DeepSeek-R1-0528
11
+ - **Input:** Text
12
+ - **Output:** Text
13
+ - **Supported Hardware Microarchitecture:** AMD MI350/MI355
14
+ - **ROCm**: 7.0
15
+ - **Operating System(s):** Linux
16
+ - **Inference Engine:** [SGLang](https://docs.sglang.ai/)
17
+ - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html)
18
+ - **Weight quantization:** OCP MXFP4, Static
19
+ - **Activation quantization:** OCP MXFP4, Dynamic
20
+ - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
21
+
22
+ This model was built with deepseek-ai DeepSeek-R1-0528 model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
23
+
24
+ # Model Quantization
25
+
26
+ The model was quantized from [deepseek-ai/DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). Both weights and activations were quantized to MXFP4 format.
27
+
28
+ **Preprocessing requirement:**
29
+
30
+ Before executing the quantization script below, the original FP8 model must first be dequantized to BFloat16.
31
+ You can either perform the dequantization manually using this [conversion script](https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/fp8_cast_bf16.py), or use the pre-converted BFloat16 model available at [unsloth/DeepSeek-R1-0528-BF16](https://huggingface.co/unsloth/DeepSeek-R1-0528-BF16).
32
+
33
+ **Quantization scripts:**
34
+ ```
35
+ cd Quark/examples/torch/language_modeling/llm_ptq/
36
+ exclude_layers="*self_attn* *mlp.gate.* *lm_head"
37
+ python3 quantize_quark.py --model_dir $MODEL_DIR \
38
+ --quant_scheme w_mxfp4_a_mxfp4 \
39
+ --group_size 32 \
40
+ --num_calib_data 128 \
41
+ --exclude_layers $exclude_layers \
42
+ --multi_gpu \
43
+ --model_export hf_format \
44
+ --output_dir amd/DeepSeek-R1-0528-MXFP4-Preview
45
+ ```
46
+
47
+ # Deployment
48
+ ### Use with SGLang
49
+
50
+ This model can be deployed efficiently using the [SGLang](https://docs.sglang.ai/) backend.
51
+
52
+ # License
53
+ Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.