lyraDiff: An Out-of-box Acceleration Engine for Diffusion and DiT Models
Sa Xiao*, Yibo Lu*, Kangjian Wu*, Bin Wu†, Haoxiong Su, Mian Peng, Qiwen Mao, Wenjiang Zhou
(*co-first author), (†Corresponding Author, [email protected])
Lyra Lab, Tencent Music Entertainment
(*co-first author), (†Corresponding Author, [email protected])
Lyra Lab, Tencent Music Entertainment
[github] [huggingface]
Introduction
🌈lyraDiff
is currently the Fastest Diffusion Acceleration Engine that doesn't need recompilation with dynamic input shapes.
The core features include:
- 🚀 State-of-the-art Inference Speed:
lyraDiff
utilizes multiple techniques to achieve up to 2x speedup of the model inference, including Quantization, Fused GEMM Kernels, Flash Attention, and NHWC & Fused GroupNorm. - 🔥 Memory Efficiency:
lyraDiff
utilizes buffer-based DRAM reuse strategy and multiple types of quantizations (FP8/INT8/INT4) to save 10-40% of DRAM usage. - 🔥 Extensive Model Support:
lyraDiff
supports a wide range of Generative/SR models such as SD1.5, SDXL, FLUX, S3Diff, SUPIR, etc., and those most commonly used plugins such as LoRA, ControlNet and Ip-Adapter. - 🔥 Zero Compilation Deployment: Unlike TensorRT or AITemplate, which takes minutes to compile,
lyraDiff
eliminates runtime recompilation overhead even with model inputs of dynamic shapes. - 🔥 Image Gen Consistency: The outputs of
lyraDiff
are aligned with the ones of HF diffusers at the pixel level, even under LoRA switch in quantization mode. - 🚀 Fast Plugin Hot-swap:
lyraDiff
provides Super Fast Model Hot-swap for ControlNet and LoRA which can hugely benefit a real-time image gen service.
lyraDiff-Flux.1-dev
is converted from the standard FLUX.1-dev model weights using this script to be compatiable with lyraDiff, and contains both FP8
and FP16
version of converted Flux.1-dev
Usage
We provide a reference implementation of lyraDiff version of Flux.1-dev, as well as sampling code, in a dedicated github repository.
Citation
@Misc{lyraDiff_2025,
author = {Kangjian Wu, Zhengtao Wang, Yibo Lu, Haoxiong Su, Sa Xiao, Qiwen Mao, Mian Peng, Bin Wu, Wenjiang Zhou},
title = {lyraDiff: Accelerating Diffusion Models with best flexibility},
howpublished = {\url{https://github.com/TMElyralab/lyraDiff}},
year = {2025}
}
- Downloads last month
- 9
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.