FastVLM
Collection
Efficient Vision Encoding for Vision Language Models
•
7 items
•
Updated
•
60
FastVLM was introduced in FastVLM: Efficient Vision Encoding for Vision Language Models. (CVPR 2025)
Benchmark | FastVLM-0.5B | FastVLM-1.5B | FastVLM-7B |
---|---|---|---|
Ai2D | 68.0 | 77.4 | 83.6 |
ScienceQA | 85.2 | 94.4 | 96.7 |
MMMU | 33.9 | 37.8 | 45.4 |
VQAv2 | 76.3 | 79.1 | 80.8 |
ChartQA | 76.0 | 80.1 | 85.0 |
TextVQA | 64.5 | 70.4 | 74.9 |
InfoVQA | 46.4 | 59.7 | 75.8 |
DocVQA | 82.5 | 88.3 | 93.2 |
OCRBench | 63.9 | 70.2 | 73.1 |
RealWorldQA | 56.1 | 61.2 | 67.2 |
SeedBench-Img | 71.0 | 74.2 | 75.4 |
The model has been exported to run with MLX. Follow the instructions in the official repository to use it in an iOS or macOS app.
If you found this model useful, please cite the following paper:
@InProceedings{fastvlm2025,
author = {Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pouransari},
title = {FastVLM: Efficient Vision Encoding for Vision Language Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025},
}