Learning Few-Step Diffusion Models by Trajectory Distribution Matching
Abstract
Accelerating diffusion model sampling is crucial for efficient AIGC deployment. While diffusion distillation methods -- based on distribution matching and trajectory matching -- reduce sampling to as few as one step, they fall short on complex tasks like text-to-image generation. Few-step generation offers a better balance between speed and quality, but existing approaches face a persistent trade-off: distribution matching lacks flexibility for multi-step sampling, while trajectory matching often yields suboptimal image quality. To bridge this gap, we propose learning few-step diffusion models by Trajectory Distribution Matching (TDM), a unified distillation paradigm that combines the strengths of distribution and trajectory matching. Our method introduces a data-free score distillation objective, aligning the student's trajectory with the teacher's at the distribution level. Further, we develop a sampling-steps-aware objective that decouples learning targets across different steps, enabling more adjustable sampling. This approach supports both deterministic sampling for superior image quality and flexible multi-step adaptation, achieving state-of-the-art performance with remarkable efficiency. Our model, TDM, outperforms existing methods on various backbones, such as SDXL and PixArt-alpha, delivering superior quality and significantly reduced training costs. In particular, our method distills PixArt-alpha into a 4-step generator that outperforms its teacher on real user preference at 1024 resolution. This is accomplished with 500 iterations and 2 A800 hours -- a mere 0.01% of the teacher's training cost. In addition, our proposed TDM can be extended to accelerate text-to-video diffusion. Notably, TDM can outperform its teacher model (CogVideoX-2B) by using only 4 NFE on VBench, improving the total score from 80.91 to 81.65. Project page: https://tdm-t2x.github.io/
Community
We introduce TDM to distill a few-step student that can surpass the teacher diffusion model in an image/video-free way. TDM is highly efficient and effective. In particular, our TDM distills PixArt-α into a 4-step generator that outperforms its teacher on real user preference. This is accomplished with 500 iterations and 2 A800 hours -- a mere 0.01% of the teacher's training cost. In addition, our proposed TDM can be extended to accelerate text-to-video diffusion. Notably, TDM can outperform its teacher model (CogVideoX-2B) by using only 4 NFE on VBench, improving the total score from 80.91 to 81.65.
Check details at our project page: https://tdm-t2x.github.io/
Moreover, the pre-trained models have also been released at https://github.com/Luo-Yihong/TDM
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Adding Additional Control to One-Step Diffusion with Joint Distribution Matching (2025)
- One-step Diffusion Models with $f$-Divergence Distribution Matching (2025)
- SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation (2025)
- Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening (2025)
- RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories (2025)
- ProReflow: Progressive Reflow with Decomposed Velocity (2025)
- ROCM: RLHF on consistency models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 4
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper