A pure C++ high-performance OpenAI LLM service powered by TensorRT-LLM and GRPS, with support for QWQ.
#22
by
zhaocc1106
- opened
grps-trtllm have supported QWQ-32B. Can give it a try if you are interested.
https://github.com/NetEase-Media/grps_trtllm/blob/master/docs%2Fqwq.md
zhaocc1106
changed discussion title from
A pure C++ high-performance OpenAI LLM service by TensorRT-llm + GRPS.
to A pure C++ high-performance OpenAI LLM service powered by TensorRT-LLM and GRPS, with support for QWQ.