lyraDiff: An Out-of-box Acceleration Engine for Diffusion and DiT Models
Sa Xiao*, Yibo Lu*, Kangjian Wu*, Bin Wu†, Haoxiong Su, Mian Peng, Qiwen Mao, Wenjiang Zhou
(*co-first author), (†Corresponding Author, [email protected])
Lyra Lab, Tencent Music Entertainment
(*co-first author), (†Corresponding Author, [email protected])
Lyra Lab, Tencent Music Entertainment
[github]
Introduction
🌈lyraDiff
is currently the Fastest Diffusion Acceleration Engine that doesn't need recompilation with dynamic input shapes.
The core features include:
- 🚀 State-of-the-art Inference Speed:
lyraDiff
utilizes multiple techniques to achieve up to 2x speedup of the model inference, including Quantization, Fused GEMM Kernels, Flash Attention, and NHWC & Fused GroupNorm. - 🔥 Memory Efficiency:
lyraDiff
utilizes buffer-based DRAM reuse strategy and multiple types of quantizations (FP8/INT8/INT4) to save 10-40% of DRAM usage. - 🔥 Extensive Model Support:
lyraDiff
supports a wide range of Generative/SR models such as SD1.5, SDXL, FLUX, S3Diff, SUPIR, etc., and those most commonly used plugins such as LoRA, ControlNet and Ip-Adapter. - 🔥 Zero Compilation Deployment: Unlike TensorRT or AITemplate, which takes minutes to compile,
lyraDiff
eliminates runtime recompilation overhead even with model inputs of dynamic shapes. - 🔥 Image Gen Consistency: The outputs of
lyraDiff
are aligned with the ones of HF diffusers at the pixel level, even under LoRA switch in quantization mode. - 🚀 Fast Plugin Hot-swap:
lyraDiff
provides Super Fast Model Hot-swap for ControlNet and LoRA which can hugely benefit a real-time image gen service.
lyraDiff-IP-Adapters
is converted from the standard IP-Adapter weights using this script to be compatiable with lyraDiff, and contains both SD1.5 and SDXL version of converted IP-Adapter
Usage
We provide a reference implementation of lyraDiff version of SD1.5/SDXL, as well as sampling code, in a dedicated github repository.
Example
We provide minimal script for running SDXL models + IP-Adapter with lyraDiff as follows:
import torch
import time
import sys, os
from diffusers import StableDiffusionXLPipeline
from lyradiff.lyradiff_model.module.lyradiff_ip_adapter import LyraIPAdapter
from transformers import CLIPTextModel, CLIPTokenizer, CLIPTextModelWithProjection
from lyradiff.lyradiff_model.lyradiff_unet_model import LyraDiffUNet2DConditionModel
from lyradiff.lyradiff_model.lyradiff_vae_model import LyraDiffVaeModel
from diffusers import EulerAncestralDiscreteScheduler
from PIL import Image
from diffusers.utils import load_image
import GPUtil
model_path = "/path/to/sdxl/model/"
vae_model_path = "/path/to/sdxl/sdxl-vae-fp16-fix"
text_encoder = CLIPTextModel.from_pretrained(model_path, subfolder="text_encoder").to(torch.float16).to(torch.device("cuda"))
text_encoder_2 = CLIPTextModelWithProjection.from_pretrained(model_path, subfolder="text_encoder_2").to(torch.float16).to(torch.device("cuda"))
tokenizer = CLIPTokenizer.from_pretrained(model_path, subfolder="tokenizer")
tokenizer_2 = CLIPTokenizer.from_pretrained( model_path, subfolder="tokenizer_2")
unet = LyraDiffUNet2DConditionModel(is_sdxl=True)
vae = LyraDiffVaeModel(scaling_factor=0.13025, is_upcast=False)
unet.load_from_diffusers_model(os.path.join(model_path, "unet"))
vae.load_from_diffusers_model(vae_model_path)
scheduler = EulerAncestralDiscreteScheduler.from_pretrained(model_path, subfolder="scheduler", timestep_spacing="linspace")
pipe = StableDiffusionXLPipeline(
vae=vae,
unet=unet,
text_encoder=text_encoder,
text_encoder_2=text_encoder_2,
tokenizer=tokenizer,
tokenizer_2=tokenizer_2,
scheduler=scheduler
)
ip_ckpt = "/path/to/sdxl/ip_ckpt/ip-adapter-plus_sdxl_vit-h.bin"
image_encoder_path = "/path/to/sdxl/ip_ckpt/image_encoder"
# Create LyraIPAdapter
ip_adapter = LyraIPAdapter(unet_model=unet.model, sdxl=True, device=torch.device("cuda"), ip_ckpt=ip_ckpt, ip_plus=True, image_encoder_path=image_encoder_path, num_ip_tokens=16, ip_projection_dim=1024)
# load ip_adapter image
ip_image = load_image("https://cdn-uploads.huggingface.co/production/uploads/6461b412846a6c8c8305319d/8U6yNHTPLaOC3gIWJZWGL.png")
ip_scale = 0.5
# get ip image embedding and pass it to the pipeline
ip_image_embedding = [ip_adapter.get_image_embeds_lyradiff(ip_image)['ip_hidden_states']]
# unet set ip adapter scale in unet model obj, since we cannot set ip_adapter_scale through diffusers pipeline
unet.set_ip_adapter_scale(ip_scale)
for i in range(3):
generator = torch.Generator("cuda").manual_seed(123)
start = time.perf_counter()
images = pipe(prompt="a beautiful girl, cartoon style",
height=1024,
width=1024,
num_inference_steps=20,
num_images_per_prompt=1,
guidance_scale=7.5,
negative_prompt="NSFW",
generator=torch.Generator("cuda").manual_seed(123),
ip_adapter_image_embeds=ip_image_embedding
)[0]
images[0].save(f"sdxl_ip_{i}.png")
Citation
@Misc{lyraDiff_2025,
author = {Kangjian Wu, Zhengtao Wang, Yibo Lu, Haoxiong Su, Sa Xiao, Qiwen Mao, Mian Peng, Bin Wu, Wenjiang Zhou},
title = {lyraDiff: Accelerating Diffusion Models with best flexibility},
howpublished = {\url{https://github.com/TMElyralab/lyraDiff}},
year = {2025}
}
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.