Quark Quantized DeepSeek Models
Collection
12 items
•
Updated
•
1
This model was built with deepseek-ai DeepSeek-R1-0528 model by applying AMD-Quark for MXFP4 quantization.
The model was quantized from deepseek-ai/DeepSeek-R1-0528 using AMD-Quark. Both weights and activations were quantized to MXFP4 format.
Preprocessing requirement:
Before executing the quantization script below, the original FP8 model must first be dequantized to BFloat16. You can either perform the dequantization manually using this conversion script, or use the pre-converted BFloat16 model available at unsloth/DeepSeek-R1-0528-BF16.
Quantization scripts:
cd Quark/examples/torch/language_modeling/llm_ptq/
exclude_layers="*self_attn* *mlp.gate.* *lm_head"
python3 quantize_quark.py --model_dir $MODEL_DIR \
--quant_scheme w_mxfp4_a_mxfp4 \
--group_size 32 \
--num_calib_data 128 \
--exclude_layers $exclude_layers \
--multi_gpu \
--model_export hf_format \
--output_dir amd/DeepSeek-R1-0528-MXFP4-Preview
This model can be deployed efficiently using the SGLang backend.
Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.
Base model
deepseek-ai/DeepSeek-R1-0528