LeanK: Learnable K Cache Channel Pruning for Efficient Decoding Paper • 2508.02215 • Published 29 days ago • 11
FocusLLM: Scaling LLM's Context by Parallel Decoding Paper • 2408.11745 • Published Aug 21, 2024 • 26
Efficient Attention Mechanisms for Large Language Models: A Survey Paper • 2507.19595 • Published Jul 25 • 6