Submitted by tellarin 68 Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia · 92 authors 1
Submitted by ColeYzzzz 47 LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL · 10 authors 2
Submitted by a43992899 42 YuE: Scaling Open Foundation Models for Long-Form Music Generation · 57 authors 1
Submitted by Owen777 26 MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice · 13 authors 1
Submitted by Xuerui123 25 UniF^2ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models · 8 authors 2
Submitted by Z-MU-Z 22 SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories · 8 authors 1
Submitted by wujie10 22 Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model · 28 authors 1
Submitted by hsaest 16 Implicit Reasoning in Transformers is Reasoning through Shortcuts · 4 authors 1
Submitted by CohenQu 16 Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning · 7 authors 1
Submitted by Harold328 15 LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization · 11 authors 1
Submitted by subin-kim 15 Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling · 5 authors 1
Submitted by LegendBC 13 OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models · 5 authors 1
Submitted by Jianxiong 11 CineBrain: A Large-Scale Multi-Modal Brain Dataset During Naturalistic Audiovisual Narrative Processing · 5 authors 1
Submitted by ArturoDeza 8 Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru · 4 authors 1
Submitted by MaverickAlex 7 ^RFLAV: Rolling Flow matching for infinite Audio Video generation · 7 authors 1
Submitted by XinXuNLPer 6 BiasEdit: Debiasing Stereotyped Language Models via Model Editing · 4 authors 1
Submitted by KID-22 5 Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents · 9 authors 1
Submitted by XiaXin-Aloys 5 RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories · 6 authors 1
Submitted by RohamKoohestani 5 Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol · 3 authors 1
Submitted by Jinfa 4 QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension · 11 authors 1
Submitted by kwanY 4 AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models · 4 authors 1
Submitted by WYLing 4 VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering · 10 authors 1
Submitted by parishadbehnam 3 Exploiting Instruction-Following Retrievers for Malicious Information Retrieval · 3 authors 1
Submitted by luoyingfeng 2 Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation · 11 authors 1
Submitted by amodaresi - Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence · 4 authors 1