Submitted by czczup 129 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling · 40 authors 5
Submitted by taesiri 50 EXAONE 3.5: Series of Large Language Models for Real-world Use Cases · 33 authors 4
Submitted by yuexiang96 47 MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale · 10 authors 2
Submitted by CodeGoat24 45 LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment · 6 authors 3
Submitted by thuanz123 35 SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion · 5 authors 6
Submitted by ChenYi99 21 Moto: Latent Motion Token as the Bridging Language for Robot Manipulation · 7 authors 2
Submitted by NinaKarine 19 GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration · 6 authors 2
Submitted by xchen16 18 CompCap: Improving Multimodal Large Language Models with Composite Captions · 11 authors 4
Submitted by EthanTaylor 16 Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction · 4 authors 3
Submitted by joanrodai 13 BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks · 43 authors 2
Submitted by BestWishYsh 10 Mind the Time: Temporally-Controlled Multi-Event Video Generation · 8 authors 2
Submitted by Valentina-Zhang 10 2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction · 6 authors 2
Submitted by iiiiwis 7 DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling · 8 authors 2
Submitted by hsikchi 4 RL Zero: Zero-Shot Language to Behaviors without any Supervision · 9 authors 2