R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning Paper • 2508.21113 • Published 5 days ago • 86