view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference 22 days ago • 63
view article Article Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding Jan 30, 2024 • 9
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published May 23, 2024 • 16