From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models Paper • 2406.16838 • Published Jun 24, 2024 • 2
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Paper • 2407.02490 • Published Jul 2, 2024 • 23