view article Article What is test-time compute and how to scale it? By Kseniase and 1 other • Feb 6 • 54
Llama 3.1 GPTQ, AWQ, and BNB Quants Collection Optimised Quants for high-throughput deployments! Compatible with Transformers, TGI & VLLM 🤗 • 9 items • Updated Sep 26, 2024 • 56
FP8 LLMs for vLLM Collection Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! • 44 items • Updated Oct 17, 2024 • 67
INT8 LLMs for vLLM Collection Accurate INT8 quantized models by Neural Magic, ready for use with vLLM! • 50 items • Updated Sep 26, 2024 • 15