Update README.md
Browse files
README.md
CHANGED
|
@@ -10,7 +10,7 @@ library_name: transformers
|
|
| 10 |
|
| 11 |
# Introduction
|
| 12 |
|
| 13 |
-
The Aquila-VL-2B
|
| 14 |
|
| 15 |
The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
|
| 16 |
|
|
|
|
| 10 |
|
| 11 |
# Introduction
|
| 12 |
|
| 13 |
+
The Aquila-VL-2B model is a vision-language model (VLM) trained based on the [LLava-one-vision](https://llava-vl.github.io/blog/2024-08-05-llava-onevision/) framework. The [Qwen2.5-1.5B-instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model is chose as the LLM, while [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384) is utilized as the vision tower.
|
| 14 |
|
| 15 |
The model was trained on our self-built Infinity-MM dataset, which contains approximately 40 million image-text pairs. This dataset is a combination of open-source data collected from the internet and synthetic instruction data generated using open-source VLM models.
|
| 16 |
|