@tianchez on Hugging Face: "Introducing VLM-R1! GRPO has helped DeepSeek R1 to learn reasoning. Can it…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

tianchez

posted an update Feb 15, 2025

Post

4623

Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1

alandao

Feb 24, 2025

•

edited Feb 24, 2025

Great job guys, reasoning bringing so many potential!

we also have similiar idea! but only applied for maze

https://huggingface.co/homebrewltd/AlphaMaze-v0.2-1.5B

tianchez

Feb 24, 2025

looks very cool!

mbiswas

Feb 27, 2025

Hi thanks a lot sharing, I tried a similar approach for making the vlm point to objects in the image, in x y co ordinates using the pixmo points dataset. But inspite of training on around 20k subset of the dataset, the model just produces random x y values and is not improving the reward at all beyond a certain point. I am using a format reward similar to you, and the distance between predicted point and truth as reward I.e. exp(-distance) . It just doesn’t work!! Do you have any insights why it doesn’t work for pointing ? I used qwen2vl 2b.

In this post