Papers
arxiv:2508.21113

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

Published on Aug 28
ยท Submitted by YannQi on Sep 1
#1 Paper of the day
Authors:
,
,
,
,
,

Abstract

R-4B, an auto-thinking multimodal large language model, uses bi-mode annealing and Bi-mode Policy Optimization to adaptively decide on problem-solving strategies, achieving state-of-the-art performance with lower computational cost.

AI-generated summary

Multimodal Large Language Models (MLLMs) equipped with step-by-step thinking capabilities have demonstrated remarkable performance on complex reasoning problems. However, this thinking process is redundant for simple problems solvable without complex reasoning. To address this inefficiency, we propose R-4B, an auto-thinking MLLM, which can adaptively decide when to think based on problem complexity. The central idea of R-4B is to empower the model with both thinking and non-thinking capabilities using bi-mode annealing, and apply Bi-mode Policy Optimization~(BPO) to improve the model's accuracy in determining whether to activate the thinking process. Specifically, we first train the model on a carefully curated dataset spanning various topics, which contains samples from both thinking and non-thinking modes. Then it undergoes a second phase of training under an improved GRPO framework, where the policy model is forced to generate responses from both modes for each input query. Experimental results show that R-4B achieves state-of-the-art performance across 25 challenging benchmarks. It outperforms Qwen2.5-VL-7B in most tasks and achieves performance comparable to larger models such as Kimi-VL-A3B-Thinking-2506 (16B) on reasoning-intensive benchmarks with lower computational cost.

Community

Paper submitter

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning

[๐Ÿ“š Arxiv Paper] [๐Ÿค— Hugging Face] [๐Ÿ’ป Code]

We present R-4B, a multimodal large language model designed for general-purpose auto-thinking, autonomously switching between step-by-step thinking and direct response generation based on task complexity. This capability enables R-4B to deliver high-quality responses while significantly improving inference efficiency and reducing computational costs.

The development of R-4B follows a two-stage training paradigm:
(1) Bi-mode Annealing, which establishes both thinking and non-thinking capabilities for VQA; and
(2) Bi-mode Policy Optimization (BPO), which enables the model to adaptively switch between thinking and non-thinking modes based on input demands.

๐Ÿš€ Key Features

  • ๐Ÿง  Think Smart, Act Fast: Adaptive & Controllable Thinking!
    Our model provides three-mode control over the response process.

    • Auto-thinking Mode: Unleash auto-thinking that works across general topics, from simple Q&A to complex scientific analysis. It saves time and computation by thinking only when it matters.
    • Support Manual Control: Explicitly command the model to use its thinking or non-thinking capabilities, enabling you to make your choices for every job.
  • ๐Ÿ† Strong Performance, Open for Everyone!
    Our model is now fully open-source. It achieves state-of-the-art performance among models of comparable size.

๐Ÿ“ข News

  • [2025.08.20] ๐Ÿš€ vLLM Support is Here! Our R-4B model is now fully compatible with vLLM for high-performance inference.
  • [2025.08.18] ๐Ÿ† Top Rank Achieved! We are thrilled to announce that R-4B is now ranked #1 among all open-source models on the OpenCompass Multi-modal Reasoning Leaderboard!
  • [2025.08.11] ๐Ÿฅ‡ Rank #1! R-4B ranks first under 20B parameters on the OpenCompass Multi-modal Academic Leaderboard!
  • [2025.08.05] ๐ŸŽ‰ R-4B is Released! Our model is now publicly available. You can download it from Hugging Face.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.21113 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.21113 in a Space README.md to link it from this page.

Collections including this paper 11