QwQ-32B-FP8-Dynamic / README.md
ig1sa's picture
Update README.md
c131994 verified
|
raw
history blame
460 Bytes
---
license: apache-2.0
base_model:
- Qwen/QwQ-32B
---
Example run:
```bash
docker run --rm --runtime nvidia --gpus 'all' -e VLLM_WORKER_MULTIPROC_METHOD=spawn -e 'HF_TOKEN' -v '/root/.cache/huggingface:/root/.cache/huggingface' -p 127.0.0.1:8000:8000 "vllm/vllm-openai:v0.7.3" --model 'ig1/QwQ-32B-FP8-Dynamic' --served-model-name 'QwQ-32B' --enable-reasoning --reasoning-parser deepseek_r1 --override-generation-config '{"temperature":0.6,"top_p":0.95}'
```