Submitted by xianbao 170 GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models · 171 authors 2.37k 7
Submitted by RyanL22 57 Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off · 2 authors 304 4
Submitted by SiriusL 25 InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization · 13 authors 107 2
Submitted by YerbaPage 19 Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal · 7 authors 7 3
Submitted by JorgeeGF 18 Hidden Dynamics of Massive Activations in Transformer Training · 5 authors 4
Submitted by hdong51 11 Adapting Vision-Language Models Without Labels: A Comprehensive Survey · 6 authors 36 2
Submitted by MikolajZ 11 GENIE: Gaussian Encoding for Neural Radiance Fields Interactive Editing · 4 authors 11 2
Submitted by huxueyu 9 OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use · 29 authors 345 2
Submitted by fsk515 9 MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh · 9 authors 3
Submitted by KejiaRobust 6 MELLA: Bridging Linguistic Capability and Cultural Groundedness for Low-Resource Language MLLMs · 7 authors 2
Submitted by shijiezhou 6 VLM4D: Towards Spatiotemporal Awareness in Vision Language Models · 10 authors 2
Submitted by LianShuQuan 4 UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding · 7 authors 2
Submitted by thebluser 3 LightSwitch: Multi-view Relighting with Material-guided Diffusion · 3 authors 3