Update: README
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ tags:
|
|
16 |
|
17 |
<h1>A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone</h1>
|
18 |
|
19 |
-
[GitHub](https://github.com/OpenBMB/MiniCPM-o) | [Demo](http://101.126.42.235:30910/)</a>
|
20 |
|
21 |
|
22 |
|
@@ -135,7 +135,7 @@ MiniCPM-V 4.5 can be easily used in various ways: (1) [llama.cpp](https://github
|
|
135 |
</table>
|
136 |
</div>
|
137 |
|
138 |
-
Both Video-MME and OpenCompass were evaluated using 8ΓA100 GPUs for inference. The reported inference time of Video-MME excludes the cost of video frame extraction.
|
139 |
|
140 |
### Examples
|
141 |
|
@@ -161,6 +161,91 @@ We deploy MiniCPM-V 4.5 on iPad M4 with [iOS demo](https://github.com/tc-mb/Mini
|
|
161 |
<img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmv4_5/v45_cn_travel.gif" width="45%" style="display: inline-block; margin: 0 10px;"/>
|
162 |
</div>
|
163 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
164 |
|
165 |
## Usage
|
166 |
|
@@ -358,7 +443,42 @@ question = 'Compare image 1 and image 2, tell me about the differences between i
|
|
358 |
msgs = [{'role': 'user', 'content': [image1, image2, question]}]
|
359 |
|
360 |
answer = model.chat(
|
361 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
362 |
msgs=msgs,
|
363 |
tokenizer=tokenizer
|
364 |
)
|
|
|
16 |
|
17 |
<h1>A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone</h1>
|
18 |
|
19 |
+
[GitHub](https://github.com/OpenBMB/MiniCPM-o) | [CookBook](https://github.com/OpenSQZ/MiniCPM-V-CookBook) | [Demo](http://101.126.42.235:30910/)</a>
|
20 |
|
21 |
|
22 |
|
|
|
135 |
</table>
|
136 |
</div>
|
137 |
|
138 |
+
Both Video-MME and OpenCompass were evaluated using 8ΓA100 GPUs for inference. The reported inference time of Video-MME includes full model-side computation, and excludes the external cost of video frame extraction (dependent on specific frame extraction tools) for fair comparison.
|
139 |
|
140 |
### Examples
|
141 |
|
|
|
161 |
<img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmv4_5/v45_cn_travel.gif" width="45%" style="display: inline-block; margin: 0 10px;"/>
|
162 |
</div>
|
163 |
|
164 |
+
## Framework Support Matrix
|
165 |
+
<table>
|
166 |
+
<thead>
|
167 |
+
<tr>
|
168 |
+
<th>Category</th>
|
169 |
+
<th>Framework</th>
|
170 |
+
<th>Cookbook Link</th>
|
171 |
+
<th>Upstream PR</th>
|
172 |
+
<th>Supported since (branch)</th>
|
173 |
+
<th>Supported since (release)</th>
|
174 |
+
</tr>
|
175 |
+
</thead>
|
176 |
+
<tbody>
|
177 |
+
<tr>
|
178 |
+
<td rowspan="2">Edge (On-device)</td>
|
179 |
+
<td>Llama.cpp</td>
|
180 |
+
<td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/llama.cpp/minicpm-v4_5_llamacpp.md">Llama.cpp Doc</a></td>
|
181 |
+
<td><a href="https://github.com/ggml-org/llama.cpp/pull/15575">#15575</a> (2025-08-26)</td>
|
182 |
+
<td>master (2025-08-26)</td>
|
183 |
+
<td><a href="https://github.com/ggml-org/llama.cpp/releases/tag/b6282">b6282</a></td>
|
184 |
+
</tr>
|
185 |
+
<tr>
|
186 |
+
<td>Ollama</td>
|
187 |
+
<td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/ollama/minicpm-v4_5_ollama.md">Ollama Doc</a></td>
|
188 |
+
<td><a href="https://github.com/ollama/ollama/pull/12078">#12078</a> (2025-08-26)</td>
|
189 |
+
<td>Merging</td>
|
190 |
+
<td>Waiting for official release</td>
|
191 |
+
</tr>
|
192 |
+
<tr>
|
193 |
+
<td rowspan="2">Serving (Cloud)</td>
|
194 |
+
<td>vLLM</td>
|
195 |
+
<td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/vllm/minicpm-v4_5_vllm.md">vLLM Doc</a></td>
|
196 |
+
<td><a href="https://github.com/vllm-project/vllm/pull/23586">#23586</a> (2025-08-26)</td>
|
197 |
+
<td>main (2025-08-27)</td>
|
198 |
+
<td>Waiting for official release</td>
|
199 |
+
</tr>
|
200 |
+
<tr>
|
201 |
+
<td>SGLang</td>
|
202 |
+
<td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/sglang/MiniCPM-v4_5_sglang.md">SGLang Doc</a></td>
|
203 |
+
<td><a href="https://github.com/sgl-project/sglang/pull/9610">#9610</a> (2025-08-26)</td>
|
204 |
+
<td>Merging</td>
|
205 |
+
<td>Waiting for official release</td>
|
206 |
+
</tr>
|
207 |
+
<tr>
|
208 |
+
<td>Finetuning</td>
|
209 |
+
<td>LLaMA-Factory</td>
|
210 |
+
<td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/finetune_llamafactory.md">LLaMA-Factory Doc</a></td>
|
211 |
+
<td><a href="https://github.com/hiyouga/LLaMA-Factory/pull/9022">#9022</a> (2025-08-26)</td>
|
212 |
+
<td>main (2025-08-26)</td>
|
213 |
+
<td>Waiting for official release</td>
|
214 |
+
</tr>
|
215 |
+
<tr>
|
216 |
+
<td rowspan="3">Quantization</td>
|
217 |
+
<td>GGUF</td>
|
218 |
+
<td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/gguf/minicpm-v4_5_gguf_quantize.md">GGUF Doc</a></td>
|
219 |
+
<td>β</td>
|
220 |
+
<td>β</td>
|
221 |
+
<td>β</td>
|
222 |
+
</tr>
|
223 |
+
<tr>
|
224 |
+
<td>BNB</td>
|
225 |
+
<td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/bnb/minicpm-v4_5_bnb_quantize.md">BNB Doc</a></td>
|
226 |
+
<td>β</td>
|
227 |
+
<td>β</td>
|
228 |
+
<td>β</td>
|
229 |
+
</tr>
|
230 |
+
<tr>
|
231 |
+
<td>AWQ</td>
|
232 |
+
<td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/awq/minicpm-v4_5_awq_quantize.md">AWQ Doc</a></td>
|
233 |
+
<td>β</td>
|
234 |
+
<td>β</td>
|
235 |
+
<td>β</td>
|
236 |
+
</tr>
|
237 |
+
<tr>
|
238 |
+
<td>Demos</td>
|
239 |
+
<td>Gradio Demo</td>
|
240 |
+
<td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/web_demo/gradio/README.md">Gradio Demo Doc</a></td>
|
241 |
+
<td>β</td>
|
242 |
+
<td>β</td>
|
243 |
+
<td>β</td>
|
244 |
+
</tr>
|
245 |
+
</tbody>
|
246 |
+
</table>
|
247 |
+
|
248 |
+
> Note: If you'd like us to prioritize support for another open-source framework, please let us know via this [short form](https://docs.google.com/forms/d/e/1FAIpQLSdyTUrOPBgWqPexs3ORrg47ZcZ1r4vFQaA4ve2iA7L9sMfMWw/viewform).
|
249 |
|
250 |
## Usage
|
251 |
|
|
|
443 |
msgs = [{'role': 'user', 'content': [image1, image2, question]}]
|
444 |
|
445 |
answer = model.chat(
|
446 |
+
msgs=msgs,
|
447 |
+
tokenizer=tokenizer
|
448 |
+
)
|
449 |
+
print(answer)
|
450 |
+
```
|
451 |
+
</details>
|
452 |
+
|
453 |
+
|
454 |
+
#### In-context few-shot learning
|
455 |
+
<details>
|
456 |
+
<summary> Click to view Python code running MiniCPM-V 4.5 with few-shot input. </summary>
|
457 |
+
|
458 |
+
```python
|
459 |
+
import torch
|
460 |
+
from PIL import Image
|
461 |
+
from transformers import AutoModel, AutoTokenizer
|
462 |
+
|
463 |
+
model = AutoModel.from_pretrained('openbmb/MiniCPM-V-4_5', trust_remote_code=True,
|
464 |
+
attn_implementation='sdpa', torch_dtype=torch.bfloat16)
|
465 |
+
model = model.eval().cuda()
|
466 |
+
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-4_5', trust_remote_code=True)
|
467 |
+
|
468 |
+
question = "production date"
|
469 |
+
image1 = Image.open('example1.jpg').convert('RGB')
|
470 |
+
answer1 = "2023.08.04"
|
471 |
+
image2 = Image.open('example2.jpg').convert('RGB')
|
472 |
+
answer2 = "2007.04.24"
|
473 |
+
image_test = Image.open('test.jpg').convert('RGB')
|
474 |
+
|
475 |
+
msgs = [
|
476 |
+
{'role': 'user', 'content': [image1, question]}, {'role': 'assistant', 'content': [answer1]},
|
477 |
+
{'role': 'user', 'content': [image2, question]}, {'role': 'assistant', 'content': [answer2]},
|
478 |
+
{'role': 'user', 'content': [image_test, question]}
|
479 |
+
]
|
480 |
+
|
481 |
+
answer = model.chat(
|
482 |
msgs=msgs,
|
483 |
tokenizer=tokenizer
|
484 |
)
|