WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation Paper • 2503.07265 • Published 3 days ago • 4
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model Paper • 2411.17459 • Published Nov 26, 2024 • 11
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 114
MoH: Multi-Head Attention as Mixture-of-Head Attention Paper • 2410.11842 • Published Oct 15, 2024 • 22
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis Paper • 2409.02048 • Published Sep 3, 2024 • 3