Speculative Streaming: Fast LLM Inference without Auxiliary Models Paper • 2402.11131 • Published Feb 16, 2024 • 44 • 3
DarwinLM: Evolutionary Structured Pruning of Large Language Models Paper • 2502.07780 • Published Feb 11 • 18 • 7