Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation Paper • 2501.15907 • Published 30 days ago • 15
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published Jan 21 • 51
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise Paper • 2501.08331 • Published Jan 14 • 20
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 94
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 80
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion Paper • 2412.04301 • Published Dec 5, 2024 • 36
One Shot, One Talk: Whole-body Talking Avatar from a Single Image Paper • 2412.01106 • Published Dec 2, 2024 • 20