SpatialTree: How Spatial Abilities Branch Out in MLLMs Paper • 2512.20617 • Published 9 days ago • 42
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition Paper • 2512.15603 • Published 16 days ago • 57
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published 24 days ago • 115
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published 27 days ago • 38
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing Paper • 2512.02589 • Published about 1 month ago • 64
Thinking with Programming Vision: Towards a Unified View for Thinking with Images Paper • 2512.03746 • Published 30 days ago • 16
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published about 1 month ago • 32
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published about 1 month ago • 242
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20, 2025 • 92
Depth Anything 3: Recovering the Visual Space from Any Views Paper • 2511.10647 • Published Nov 13, 2025 • 95
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published Nov 12, 2025 • 201
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 210
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning Paper • 2510.27492 • Published Oct 30, 2025 • 82