TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs Paper • 2512.14698 • Published 18 days ago • 19
LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Image and Video Generation Paper • 2508.03485 • Published Aug 5, 2025 • 2
LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Image and Video Generation Paper • 2508.03485 • Published Aug 5, 2025 • 2
From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model Paper • 2510.19871 • Published Oct 22, 2025 • 29
Video-As-Prompt Collection The model zoo for "Video-As-Prompt: Unified Semantic Control for Video Generation" • 3 items • Updated Oct 27, 2025 • 13
Video-As-Prompt: Unified Semantic Control for Video Generation Paper • 2510.20888 • Published Oct 23, 2025 • 45
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning Paper • 2509.08519 • Published Sep 10, 2025 • 128
AudioStory: Generating Long-Form Narrative Audio with Large Language Models Paper • 2508.20088 • Published Aug 27, 2025 • 21
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs Paper • 2508.14896 • Published Aug 20, 2025 • 22
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs Paper • 2508.14896 • Published Aug 20, 2025 • 22
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs Paper • 2508.14896 • Published Aug 20, 2025 • 22 • 2
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing Paper • 2508.10881 • Published Aug 14, 2025 • 52