Composing Concepts from Images and Videos via Concept-prompt Binding Paper โข 2512.09824 โข Published 19 days ago โข 27
MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment Paper โข 2512.06628 โข Published 23 days ago โข 12
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement Paper โข 2511.23475 โข Published Nov 28 โข 41
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation Paper โข 2509.18824 โข Published Sep 23 โข 22
pyannote/speaker-diarization-3.1 Automatic Speech Recognition โข Updated May 10, 2024 โข 15.1M โข 1.39k