Submitted by Weiyun1025 281 InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models · 47 authors 8.86k 8
Submitted by LIKirin 134 PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters · 5 authors 7
Submitted by cuijiaxing 49 Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability · 3 authors 2
Submitted by wenhu 43 VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning · 6 authors 142 2
Submitted by starriver030515 38 FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding · 7 authors 180 3
Submitted by mponty 34 Iterative Self-Training for Code Generation via Reinforced Re-Ranking · 3 authors 2
Submitted by DogNeverSleep 30 Mavors: Multi-granularity Video Representation for Multimodal Large Language Model · 15 authors 13 2
Submitted by xhluca 27 AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories · 10 authors 36 2
Submitted by AIRobotZ 21 S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models · 5 authors 6 3
Submitted by ztwang 19 DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training · 4 authors 2
Submitted by brucelyu 17 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users · 21 authors 3
Submitted by leoozy 17 Breaking the Data Barrier -- Building GUI Agents Through Task Generalization · 7 authors 21 2
Submitted by akhaliq 14 M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models · 6 authors 38 2
Submitted by yyamada 14 The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search · 8 authors 3
Submitted by codezakh 13 Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems · 5 authors 2
Submitted by LibraTree 12 VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search · 8 authors 29 4
Submitted by parshinsh 9 LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models · 6 authors 70 2
Submitted by ChrisJuan 7 EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety · 10 authors 15 3
Submitted by mqliu 5 LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models · 11 authors 2
Submitted by Rexhaif 4 DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization? · 8 authors 2
Submitted by kpzhang996 4 MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models · 20 authors 2
Submitted by johnhalloran 4 MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits · 2 authors 119 2
Submitted by SteveZeyuZhang 1 DiffuMural: Restoring Dunhuang Murals with Multi-scale Diffusion · 9 authors 2