Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 3 days ago • 87
MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice Paper • 2503.05978 • Published 6 days ago • 29
Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling Paper • 2503.08605 • Published 2 days ago • 20
Automated Movie Generation via Multi-Agent CoT Planning Paper • 2503.07314 • Published 3 days ago • 34
FedRand: Enhancing Privacy in Federated Learning with Randomized LoRA Subparameter Updates Paper • 2503.07216 • Published 3 days ago • 26
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published 3 days ago • 51
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities Paper • 2503.03983 • Published 8 days ago • 22
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching Paper • 2503.05179 • Published 7 days ago • 42
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models Paper • 2503.05638 • Published 6 days ago • 17
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Paper • 2503.03751 • Published 8 days ago • 19
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published 10 days ago • 72
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models Paper • 2503.01774 • Published 10 days ago • 39
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids Paper • 2502.20396 • Published 14 days ago • 12
DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping Paper • 2502.20900 • Published 13 days ago • 7