Doesn't Generate `<think>` tags
#25
by
bingw5
- opened
The response doesn't contain <think>
and </think>
tags, only </details>
. Is this by design?
Sure it does. Use llama.cpp. Command line prompt along these lines depending on your system/os: .\llama-cli --model QwQ-32B-Q8_0.gguf --temp 0.0 --color --threads 36 --ctx-size 128000
Are you using Open-WebUI by any chance? When using it with SillyTavern, it produces <think>
tags for me just fine. I suggest trying QwQ-32B 8bpw EXL2 quant with TabbyAPI, using DeepSeek-R1-Distill-Qwen-1.5B-4bpw-exl2 as a draft model for speculative decoding, for the best speed and quality.