Submitted by runninglsy 79 Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities · 10 authors 634 5
Submitted by SpaceProduct 66 ZeroSearch: Incentivize the Search Capability of LLMs without Searching · 9 authors 1.11k 8
Submitted by BestWishYsh 36 HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation · 7 authors 1.17k 3
Submitted by hyz317 27 PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer · 8 authors 350 1
Submitted by PahaII 27 OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning · 5 authors 291 1
Submitted by albertge 26 R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training · 10 authors 1
Submitted by Gracjan 25 Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models · 6 authors 1
Submitted by renqiux0302 12 Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving · 6 authors 25 1
Submitted by itaowe 9 OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution · 10 authors 12 1
Submitted by Ningyu 9 Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey · 9 authors 1
Submitted by huangsiteng 8 OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation · 13 authors 254 1
Submitted by mariya-davydova 8 OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents · 5 authors 18 1
Submitted by Eavn 3 Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection · 3 authors 10 1
Submitted by Tournesol-Saturday 2 RAIL: Region-Aware Instructive Learning for Semi-Supervised Tooth Segmentation in CBCT · 7 authors 23 1
Submitted by linxule 1 Cognitio Emergens: Agency, Dimensions, and Dynamics in Human-AI Knowledge Co-Creation · 1 authors 1