Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Paper • 2503.06749 • Published 4 days ago • 20
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • Updated 1 day ago • 441k • 1.12k