Composing Concepts from Images and Videos via Concept-prompt Binding Paper • 2512.09824 • Published 18 days ago • 27
MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment Paper • 2512.06628 • Published 21 days ago • 12
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement Paper • 2511.23475 • Published 30 days ago • 41
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation Paper • 2509.18824 • Published Sep 23 • 22