Update README.md
Browse files
README.md
CHANGED
|
@@ -12,6 +12,8 @@ pipeline_tag: visual-question-answering
|
|
| 12 |
|
| 13 |
# VL-Rethinker-72B
|
| 14 |
|
|
|
|
|
|
|
| 15 |
**VL-Rethinker-72B** achieves SoTA results on various multimodal reasoning benchmarks.
|
| 16 |
|
| 17 |
It is trained using the **Forced Rethinking** technique, on top of [**VL-Reasoner**](https://huggingface.co/TIGER-Lab/VL-Reasoner-72B/) with **GRPO-SSR** training.
|
|
@@ -27,7 +29,7 @@ Explore further via the following links:
|
|
| 27 |
## Citation
|
| 28 |
|
| 29 |
If you feel this model useful, please give us a free cite:
|
| 30 |
-
```
|
| 31 |
@article{vl-rethinker,
|
| 32 |
title={VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning},
|
| 33 |
author = {Wang, Haozhe and Qu, Chao and Huang, Zuming and Chu, Wei and Lin, Fangzhen and Chen, Wenhu},
|
|
|
|
| 12 |
|
| 13 |
# VL-Rethinker-72B
|
| 14 |
|
| 15 |
+
**🚀 News:** <u>We release our meticulously curated collection of RL training queries for multimodal reasoning: [ViRL39K](https://huggingface.co/datasets/TIGER-Lab/ViRL39K).</u>
|
| 16 |
+
|
| 17 |
**VL-Rethinker-72B** achieves SoTA results on various multimodal reasoning benchmarks.
|
| 18 |
|
| 19 |
It is trained using the **Forced Rethinking** technique, on top of [**VL-Reasoner**](https://huggingface.co/TIGER-Lab/VL-Reasoner-72B/) with **GRPO-SSR** training.
|
|
|
|
| 29 |
## Citation
|
| 30 |
|
| 31 |
If you feel this model useful, please give us a free cite:
|
| 32 |
+
```bibtex
|
| 33 |
@article{vl-rethinker,
|
| 34 |
title={VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning},
|
| 35 |
author = {Wang, Haozhe and Qu, Chao and Huang, Zuming and Chu, Wei and Lin, Fangzhen and Chen, Wenhu},
|