OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models Paper • 2503.18033 • Published 11 days ago • 23
Single Image Iterative Subject-driven Generation and Editing Paper • 2503.16025 • Published 14 days ago • 13
"Principal Components" Enable A New Language of Images Paper • 2503.08685 • Published 23 days ago • 11
RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling Paper • 2503.09601 • Published 22 days ago • 14
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published Feb 19 • 68
Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights Paper • 2502.09619 • Published Feb 13 • 33
Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published Jan 16 • 71
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation Paper • 2501.03059 • Published Jan 6 • 22
Edge Weight Prediction For Category-Agnostic Pose Estimation Paper • 2411.16665 • Published Nov 25, 2024 • 6
Improving Visual Commonsense in Language Models via Multiple Image Generation Paper • 2406.13621 • Published Jun 19, 2024 • 13
Video as the New Language for Real-World Decision Making Paper • 2402.17139 • Published Feb 27, 2024 • 20
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency Paper • 2310.03734 • Published Oct 5, 2023 • 15
Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 43
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation Paper • 2309.16429 • Published Sep 28, 2023 • 11