---
license: apache-2.0
language:
- zh
pipeline_tag: text-generation
---
# Qwen2-7B-Instruct Quantized with AutoFP8 with KVCache

使用 larryvrh/belle_resampled_78K_CN 校准静态量化的 [Qwen/Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) 模型，可启用 fp8 kv cache。

主要为中文通常语言逻辑任务，为 vLLM 准备。

## 使用

参数加入 `kv_cache_dtype="fp8"`

## 评估

使用 [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/7ad7c5b9d0f1c35c048af0ce8b197ebc2021dbd3) + vLLM serve 进行评估：

|项目|Qwen2-7B-Instruct|Qwen2-7B-Instruct-FP8-CN|Recovery|此项目|Recovery|
|---|---|---|---|---|---|
|ceval-valid|81.87|**81.65**|**99.73%**|81.35|99.36%|
|cmmlu|81.78|**81.26**|**99.36%**|81.19|99.28%|
|agieval_logiqa_zh (5 shots)|47.63|48.54|101.91%|46.54|97.71%|
|**平均**|**70.43**|**70.48**|**100.07%**|69.69|98.95%|