bczhou commited on
Commit
665af77
·
verified ·
1 Parent(s): f924756

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +168 -0
README.md CHANGED
@@ -1,3 +1,171 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - Lin-Chen/ShareGPT4V
5
+ - liuhaotian/LLaVA-Pretrain
6
+ - liuhaotian/LLaVA-Instruct-150K
7
+ language:
8
+ - en
9
+ - zh
10
+ tags:
11
+ - llava
12
+ - vision-language
13
+ - llm
14
+ - lmm
15
  ---
16
+ <h2 align="center"> <a href="https://arxiv.org/abs/2402.14289">TinyLLaVA: A Framework of Small-scale Large Multimodal Models</a>
17
+
18
+ <h5 align="center">
19
+
20
+ [![hf_space](https://img.shields.io/badge/🤗-%20Open%20In%20HF-blue.svg)](https://huggingface.co/bczhou/TinyLLaVA-3.1B) [![arXiv](https://img.shields.io/badge/Arxiv-2402.14289-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2402.14289) [![License](https://img.shields.io/badge/License-Apache%202.0-yellow)](https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/LICENSE)
21
+
22
+
23
+ ## &#x1F389; News
24
+ * **[2024.02.25]** Update evaluation scripts and docs!
25
+ * **[2024.02.25]** Data descriptions out. Release TinyLLaVA-1.5B and TinyLLaVA-2.0B!
26
+ * **[2024.02.24]** Example code on inference and model loading added!
27
+ * **[2024.02.23]** Evaluation code and scripts released!
28
+ * **[2024.02.21]** Creating the [TinyLLaVABench](https://github.com/DLCV-BUAA/TinyLLavaBench) repository on GitHub!
29
+ * **[2024.02.21]** Our paper: [TinyLLaVA: A Framework of Small-scale Large Multimodal Models](https://arxiv.org/abs/2402.14289) is out!
30
+ * **[2024.01.11]** Our fist model [TinyLLaVA-1.4B](https://huggingface.co/bczhou/tiny-llava-v1-hf) is out!
31
+
32
+ ## &#x231B; TODO
33
+ - [ ] Add support for Ollama and llama.cpp.
34
+ - [ ] Developers' guide / How to build demo locally.
35
+ - [x] Model Zoo descriptions.
36
+ - [x] Examples and inference.
37
+ - [x] Release code for training.
38
+ - [x] Add descriptions for evaluation.
39
+ - [x] Add descriptions for data preparation.
40
+ - [x] Release TinyLLaVA-1.5B and TinyLLaVA-2.0B.
41
+ - [x] Release TinyLLaVA-3.1B.
42
+ - [x] Release the evaluation code and weights today(2024.2.23).
43
+ ### &#x1F525; High performance, but with fewer parameters
44
+
45
+ - Our best model, TinyLLaVA-3.1B, achieves better overall performance against existing 7B models such as LLaVA-1.5 and Qwen-VL.
46
+
47
+ ## &#x1F433; Model Zoo
48
+ ### Legacy Model
49
+ - [tiny-llava-hf](https://huggingface.co/bczhou/tiny-llava-v1-hf)
50
+
51
+ ### Pretrained Models
52
+ - [TinyLLaVA-3.1B](https://huggingface.co/bczhou/TinyLLaVA-3.1B)
53
+ - [TinyLLaVA-2.0B](https://huggingface.co/bczhou/TinyLLaVA-2.0B)
54
+ - [TinyLLaVA-1.5B](https://huggingface.co/bczhou/TinyLLaVA-1.5B)
55
+
56
+ ### Model Details
57
+ | Name | LLM | Checkpoint | LLaVA-Bench-Wild | MME | MMBench | MM-Vet | SQA-image | VQA-v2 | GQA | TextVQA |
58
+ |---------------|-------------------|------------------------------------------------|------------------|----------|---------|--------|-----------|--------|-------|---------|
59
+ | TinyLLaVA-3.1B | Phi-2 | [TinyLLaVA-3.1B](https://huggingface.co/bczhou/TinyLLaVA-3.1B) | 75.8 | 1464.9 | 66.9 | 32.0 | 69.1 | 79.9 | 62.0 | 59.1 |
60
+ | TinyLLaVA-2.0B | StableLM-2-1.6B | [TinyLLaVA-2.0B](https://huggingface.co/bczhou/TinyLLaVA-2.0B) | 66.4 | 1433.8 | 63.3 | 32.6 | 64.7 | 78.9 | 61.9 | 56.4 |
61
+ | TinyLLaVA-1.5B | TinyLlama | [TinyLLaVA-1.5B](https://huggingface.co/bczhou/TinyLLaVA-1.5B) | 60.8 | 1276.5 | 55.2 | 25.8 | 60.3 | 76.9 | 60.3 | 51.7 |
62
+
63
+
64
+
65
+ ## &#x1F527; Requirements and Installation
66
+
67
+ We recommend the requirements as follows.
68
+
69
+ 1. Clone this repository and navigate to LLaVA folder
70
+ ```bash
71
+ git clone https://github.com/DLCV-BUAA/TinyLLaVABench.git
72
+ cd TinyLLaVABench
73
+ ```
74
+
75
+ 2. Install Package
76
+ ```Shell
77
+ conda create -n tinyllava python=3.10 -y
78
+ conda activate tinyllava
79
+ pip install --upgrade pip # enable PEP 660 support
80
+ pip install -e .
81
+ ```
82
+
83
+ 3. Install additional packages for training cases
84
+ ```Shell
85
+ pip install -e ".[train]"
86
+ pip install flash-attn --no-build-isolation
87
+ ```
88
+ ### Upgrade to latest code base
89
+
90
+ ```Shell
91
+ git pull
92
+ pip install -e .
93
+ # if you see some import errors when you upgrade, please try running the command below (without #)
94
+ # pip install flash-attn --no-build-isolation --no-cache-dir
95
+ ```
96
+
97
+
98
+ ## &#x1F527; Quick Start
99
+
100
+ <details>
101
+ <summary>Load model</summary>
102
+
103
+ ```Python
104
+ from tinyllava.model.builder import load_pretrained_model
105
+ from tinyllava.mm_utils import get_model_name_from_path
106
+ from tinyllava.eval.run_tiny_llava import eval_model
107
+ model_path = "bczhou/TinyLLaVA-3.1B"
108
+ tokenizer, model, image_processor, context_len = load_pretrained_model(
109
+ model_path=model_path,
110
+ model_base=None,
111
+ model_name=get_model_name_from_path(model_path)
112
+ )
113
+ ```
114
+ </details>
115
+ ## &#x1F527; Run Inference
116
+ Here's an example of running inference with [TinyLLaVA-3.1B](https://huggingface.co/bczhou/TinyLLaVA-3.1B)
117
+ <details>
118
+ <summary>Run Inference</summary>
119
+
120
+ ```Python
121
+ from tinyllava.model.builder import load_pretrained_model
122
+ from tinyllava.mm_utils import get_model_name_from_path
123
+ from tinyllava.eval.run_tiny_llava import eval_model
124
+ model_path = "bczhou/TinyLLaVA-3.1B"
125
+ prompt = "What are the things I should be cautious about when I visit here?"
126
+ image_file = "https://llava-vl.github.io/static/images/view.jpg"
127
+
128
+ args = type('Args', (), {
129
+ "model_path": model_path,
130
+ "model_base": None,
131
+ "model_name": get_model_name_from_path(model_path),
132
+ "query": prompt,
133
+ "conv_mode": "phi",
134
+ "image_file": image_file,
135
+ "sep": ",",
136
+ "temperature": 0,
137
+ "top_p": None,
138
+ "num_beams": 1,
139
+ "max_new_tokens": 512
140
+ })()
141
+ eval_model(args)
142
+ ```
143
+ </details>
144
+ ### Important
145
+ We use different `conv_mode` for different models. Replace the `conv_mode` in `args` according to this table:
146
+ | model | conv_mode |
147
+ |-------------------|---------------|
148
+ | TinyLLaVA-3.1B | phi |
149
+ | TinyLLaVA-2.0B | phi |
150
+ | TinyLLaVA-1.5B | v1 |
151
+
152
+ ## Evaluation
153
+ To ensure the reproducibility, we evaluate the models with greedy decoding.
154
+
155
+ See [Evaluation.md](https://github.com/DLCV-BUAA/TinyLLaVABench/blob/main/docs/Evaluation.md)
156
+
157
+
158
+ ## &#x270F; Citation
159
+
160
+ If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:.
161
+
162
+ ```BibTeX
163
+ @misc{zhou2024tinyllava,
164
+ title={TinyLLaVA: A Framework of Small-scale Large Multimodal Models},
165
+ author={Baichuan Zhou and Ying Hu and Xi Weng and Junlong Jia and Jie Luo and Xien Liu and Ji Wu and Lei Huang},
166
+ year={2024},
167
+ eprint={2402.14289},
168
+ archivePrefix={arXiv},
169
+ primaryClass={cs.LG}
170
+ }
171
+ ```