Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Qwen
/
QwQ-32B-AWQ
like
85
Follow
Qwen
20.2k
Text Generation
Safetensors
English
qwen2
chat
conversational
4-bit precision
awq
arxiv:
2309.00071
arxiv:
2412.15115
License:
apache-2.0
Model card
Files
Files and versions
Community
4
Use this model
有没有在3090上部署这个awq版本的,速度只有6tokens/s,正常吗
#4
by
Jsoooooo
- opened
3 days ago
Discussion
Jsoooooo
3 days ago
有没有官方的速度测试可以参考,感觉太慢了
ShiyuZhu
2 days ago
也有可能是其他设备性能带来的瓶颈
Jsoooooo
about 24 hours ago
也有可能是其他设备性能带来的瓶颈
试了使用vllm和ollama部署:
vllm很快,能到37tokens/s,但是kv cache太大,最大只能支持20k的总的上下文tokens;
ollama则是没有任何限制,可能是内部优化,速度也能到28tokens/s
See translation
Edit
Preview
Upload images, audio, and videos by dragging in the text input, pasting, or
clicking here
.
Tap or paste here to upload images
Your need to confirm your account before you can post a new comment.
Comment
·
Sign up
or
log in
to comment