arxiv:2503.06674

Learning Few-Step Diffusion Models by Trajectory Distribution Matching

Published on Mar 9

· Submitted by

Luo-Yihong on Mar 17

Upvote

Authors:

Yihong Luo ,

Tianyang Hu ,

Jiacheng Sun ,

Jing Tang

Abstract

Accelerating diffusion model sampling is crucial for efficient AIGC deployment. While diffusion distillation methods -- based on distribution matching and trajectory matching -- reduce sampling to as few as one step, they fall short on complex tasks like text-to-image generation. Few-step generation offers a better balance between speed and quality, but existing approaches face a persistent trade-off: distribution matching lacks flexibility for multi-step sampling, while trajectory matching often yields suboptimal image quality. To bridge this gap, we propose learning few-step diffusion models by Trajectory Distribution Matching (TDM), a unified distillation paradigm that combines the strengths of distribution and trajectory matching. Our method introduces a data-free score distillation objective, aligning the student's trajectory with the teacher's at the distribution level. Further, we develop a sampling-steps-aware objective that decouples learning targets across different steps, enabling more adjustable sampling. This approach supports both deterministic sampling for superior image quality and flexible multi-step adaptation, achieving state-of-the-art performance with remarkable efficiency. Our model, TDM, outperforms existing methods on various backbones, such as SDXL and PixArt-alpha, delivering superior quality and significantly reduced training costs. In particular, our method distills PixArt-alpha into a 4-step generator that outperforms its teacher on real user preference at 1024 resolution. This is accomplished with 500 iterations and 2 A800 hours -- a mere 0.01% of the teacher's training cost. In addition, our proposed TDM can be extended to accelerate text-to-video diffusion. Notably, TDM can outperform its teacher model (CogVideoX-2B) by using only 4 NFE on VBench, improving the total score from 80.91 to 81.65. Project page: https://tdm-t2x.github.io/

View arXiv page View PDF Project page GitHub repository Add to collection

Community

Luo-Yihong

Paper author Paper submitter 15 days ago

We introduce TDM to distill a few-step student that can surpass the teacher diffusion model in an image/video-free way. TDM is highly efficient and effective. In particular, our TDM distills PixArt-α into a 4-step generator that outperforms its teacher on real user preference. This is accomplished with 500 iterations and 2 A800 hours -- a mere 0.01% of the teacher's training cost. In addition, our proposed TDM can be extended to accelerate text-to-video diffusion. Notably, TDM can outperform its teacher model (CogVideoX-2B) by using only 4 NFE on VBench, improving the total score from 80.91 to 81.65.

Check details at our project page: https://tdm-t2x.github.io/

Moreover, the pre-trained models have also been released at https://github.com/Luo-Yihong/TDM

Luo-Yihong

Paper author Paper submitter 15 days ago

Teacher Samples (CogVideoX-2B 100 NFE).

Student Samples (4NFE).

The video on the above was generated by CogVideoX-2B (100 NFE). In the same amount of time, TDM (4NFE) can generate 25 videos, as shown in the below, achieving roughly a 25 times speedup without performance degradation.

librarian-bot

14 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 4

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.06674 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.06674 in a Space README.md to link it from this page.