Squeezed Attention: Accelerating Long Context Length LLM Inference Paper • 2411.09688 • Published Nov 14, 2024 • 1
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation Paper • 2512.05033 • Published 25 days ago • 15
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation Paper • 2512.05033 • Published 25 days ago • 15
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders Paper • 2510.19779 • Published Oct 22 • 60