HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading Paper • 2502.12574 • Published 8 days ago • 10
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models Paper • 2305.17235 • Published May 26, 2023 • 2
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models Paper • 2305.17235 • Published May 26, 2023 • 2