Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation Paper • 2507.10524 • Published Jul 14 • 70
Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning Paper • 2303.11101 • Published Mar 20, 2023 • 1
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation Paper • 2305.11685 • Published May 19, 2023 • 2
DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models Paper • 2305.15194 • Published May 24, 2023
DistiLLM: Towards Streamlined Distillation for Large Language Models Paper • 2402.03898 • Published Feb 6, 2024 • 3
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs Paper • 2503.07067 • Published Mar 10 • 31