MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 24 days ago • 273
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation Paper • 2502.01068 • Published 4 days ago • 14