| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						license: other | 
					
					
						
						| 
							 | 
						license_name: apple | 
					
					
						
						| 
							 | 
						license_link: https://github.com/apple/ml-fastvlm/blob/main/LICENSE | 
					
					
						
						| 
							 | 
						language: | 
					
					
						
						| 
							 | 
						- en | 
					
					
						
						| 
							 | 
						pipeline_tag: image-text-to-text | 
					
					
						
						| 
							 | 
						tags: | 
					
					
						
						| 
							 | 
						- multimodal | 
					
					
						
						| 
							 | 
						library_name: transformers | 
					
					
						
						| 
							 | 
						--- | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						# FastVLM-7B-Stage3 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Introduction | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						This is FastVLM-7B-Stage3, a multimodal language model that can understand things visually, being agentic, understand long videos and capture events, and generate structured outputs. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						This model is exported from Github [apple/ml-fastvlm](https://github.com/apple/ml-fastvlm). | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						Model's weight: [llava-fastvithd_7b_stage3.zip](https://ml-site.cdn-apple.com/datasets/fastvlm/llava-fastvithd_7b_stage3.zip). | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						### Usage | 
					
					
						
						| 
							 | 
						```python | 
					
					
						
						| 
							 | 
						from transformers import AutoTokenizer, AutoModelForCausalLM | 
					
					
						
						| 
							 | 
						 | 
					
					
						
						| 
							 | 
						model_id = 'FastVLM-7B-Stage3' | 
					
					
						
						| 
							 | 
						tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, use_fast=False) | 
					
					
						
						| 
							 | 
						model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype='auto', trust_remote_code=True) | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						### Export to MNN | 
					
					
						
						| 
							 | 
						```python | 
					
					
						
						| 
							 | 
						git clone https://github.com/alibaba/MNN | 
					
					
						
						| 
							 | 
						cd MNN/transformers/llm/export | 
					
					
						
						| 
							 | 
						python llmexport.py --path /path/to/FastVLM-7B-Stage3 --export mnn | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						## Citation | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						If you find our work helpful, feel free to give us a cite. | 
					
					
						
						| 
							 | 
						
 | 
					
					
						
						| 
							 | 
						``` | 
					
					
						
						| 
							 | 
						@InProceedings{fastvlm2025, | 
					
					
						
						| 
							 | 
						  author = {Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokul Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pouransari}, | 
					
					
						
						| 
							 | 
						  title = {FastVLM: Efficient Vision Encoding for Vision Language Models}, | 
					
					
						
						| 
							 | 
						  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, | 
					
					
						
						| 
							 | 
						  month = {June}, | 
					
					
						
						| 
							 | 
						  year = {2025}, | 
					
					
						
						| 
							 | 
						}{2023} | 
					
					
						
						| 
							 | 
						``` |