|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- Yuting6/geoqa-r1v-augmentation |
|
|
- Yuting6/math-8k-augmentation |
|
|
- Yuting6/m3cot-augmentation |
|
|
- Yuting6/TQA-augmentation |
|
|
- Yuting6/Geo3k-augmentation |
|
|
- Yuting6/geoqa-r1v-noise |
|
|
- Yuting6/geoqa-r1v-crop |
|
|
- Yuting6/geoqa-r1v-blur |
|
|
- Yuting6/geoqa-r1v-8k-rotated |
|
|
- Yuting6/geoqa-r1v-8k-mixup |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-VL-7B-Instruct |
|
|
--- |
|
|
|
|
|
|
|
|
# Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning |
|
|
|
|
|
## Paper Title and Link |
|
|
|
|
|
The model was presented in the paper [Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning](https://arxiv.org/abs/2506.09736). You can also find the paper on arXiv: [Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning (arXiv:2506.09736)](https://arxiv.org/abs/2506.09736) |
|
|
|
|
|
## Paper Abstract |
|
|
|
|
|
Vision-Matters is a simple visual perturbation framework that can be easily integrated into existing post-training pipelines including SFT, DPO, and GRPO. Our findings highlight the critical role of visual perturbation: better reasoning begins with better seeing. |
|
|
|
|
|
* ๐ **GitHub Repo:** [YutingLi0606/Vision-Matters](https://github.com/YutingLi0606/Vision-Matters) |
|
|
* ๐พ **Dataset:** [Yuting6/vision-matters on Hugging Face](https://huggingface.co/collections/Yuting6/vision-matters-684801dd1879d3e639a930d1) |