tc-mb commited on
Commit
96b1b5e
Β·
verified Β·
1 Parent(s): 09dd28e

Update: README

Browse files
Files changed (1) hide show
  1. README.md +123 -3
README.md CHANGED
@@ -16,7 +16,7 @@ tags:
16
 
17
  <h1>A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone</h1>
18
 
19
- [GitHub](https://github.com/OpenBMB/MiniCPM-o) | [Demo](http://101.126.42.235:30910/)</a>
20
 
21
 
22
 
@@ -135,7 +135,7 @@ MiniCPM-V 4.5 can be easily used in various ways: (1) [llama.cpp](https://github
135
  </table>
136
  </div>
137
 
138
- Both Video-MME and OpenCompass were evaluated using 8Γ—A100 GPUs for inference. The reported inference time of Video-MME excludes the cost of video frame extraction.
139
 
140
  ### Examples
141
 
@@ -161,6 +161,91 @@ We deploy MiniCPM-V 4.5 on iPad M4 with [iOS demo](https://github.com/tc-mb/Mini
161
  <img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmv4_5/v45_cn_travel.gif" width="45%" style="display: inline-block; margin: 0 10px;"/>
162
  </div>
163
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
164
 
165
  ## Usage
166
 
@@ -358,7 +443,42 @@ question = 'Compare image 1 and image 2, tell me about the differences between i
358
  msgs = [{'role': 'user', 'content': [image1, image2, question]}]
359
 
360
  answer = model.chat(
361
- image=None,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
362
  msgs=msgs,
363
  tokenizer=tokenizer
364
  )
 
16
 
17
  <h1>A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone</h1>
18
 
19
+ [GitHub](https://github.com/OpenBMB/MiniCPM-o) | [CookBook](https://github.com/OpenSQZ/MiniCPM-V-CookBook) | [Demo](http://101.126.42.235:30910/)</a>
20
 
21
 
22
 
 
135
  </table>
136
  </div>
137
 
138
+ Both Video-MME and OpenCompass were evaluated using 8Γ—A100 GPUs for inference. The reported inference time of Video-MME includes full model-side computation, and excludes the external cost of video frame extraction (dependent on specific frame extraction tools) for fair comparison.
139
 
140
  ### Examples
141
 
 
161
  <img src="https://raw.githubusercontent.com/openbmb/MiniCPM-o/main/assets/minicpmv4_5/v45_cn_travel.gif" width="45%" style="display: inline-block; margin: 0 10px;"/>
162
  </div>
163
 
164
+ ## Framework Support Matrix
165
+ <table>
166
+ <thead>
167
+ <tr>
168
+ <th>Category</th>
169
+ <th>Framework</th>
170
+ <th>Cookbook Link</th>
171
+ <th>Upstream PR</th>
172
+ <th>Supported since (branch)</th>
173
+ <th>Supported since (release)</th>
174
+ </tr>
175
+ </thead>
176
+ <tbody>
177
+ <tr>
178
+ <td rowspan="2">Edge (On-device)</td>
179
+ <td>Llama.cpp</td>
180
+ <td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/llama.cpp/minicpm-v4_5_llamacpp.md">Llama.cpp Doc</a></td>
181
+ <td><a href="https://github.com/ggml-org/llama.cpp/pull/15575">#15575</a> (2025-08-26)</td>
182
+ <td>master (2025-08-26)</td>
183
+ <td><a href="https://github.com/ggml-org/llama.cpp/releases/tag/b6282">b6282</a></td>
184
+ </tr>
185
+ <tr>
186
+ <td>Ollama</td>
187
+ <td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/ollama/minicpm-v4_5_ollama.md">Ollama Doc</a></td>
188
+ <td><a href="https://github.com/ollama/ollama/pull/12078">#12078</a> (2025-08-26)</td>
189
+ <td>Merging</td>
190
+ <td>Waiting for official release</td>
191
+ </tr>
192
+ <tr>
193
+ <td rowspan="2">Serving (Cloud)</td>
194
+ <td>vLLM</td>
195
+ <td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/vllm/minicpm-v4_5_vllm.md">vLLM Doc</a></td>
196
+ <td><a href="https://github.com/vllm-project/vllm/pull/23586">#23586</a> (2025-08-26)</td>
197
+ <td>main (2025-08-27)</td>
198
+ <td>Waiting for official release</td>
199
+ </tr>
200
+ <tr>
201
+ <td>SGLang</td>
202
+ <td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/deployment/sglang/MiniCPM-v4_5_sglang.md">SGLang Doc</a></td>
203
+ <td><a href="https://github.com/sgl-project/sglang/pull/9610">#9610</a> (2025-08-26)</td>
204
+ <td>Merging</td>
205
+ <td>Waiting for official release</td>
206
+ </tr>
207
+ <tr>
208
+ <td>Finetuning</td>
209
+ <td>LLaMA-Factory</td>
210
+ <td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/finetune/finetune_llamafactory.md">LLaMA-Factory Doc</a></td>
211
+ <td><a href="https://github.com/hiyouga/LLaMA-Factory/pull/9022">#9022</a> (2025-08-26)</td>
212
+ <td>main (2025-08-26)</td>
213
+ <td>Waiting for official release</td>
214
+ </tr>
215
+ <tr>
216
+ <td rowspan="3">Quantization</td>
217
+ <td>GGUF</td>
218
+ <td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/gguf/minicpm-v4_5_gguf_quantize.md">GGUF Doc</a></td>
219
+ <td>β€”</td>
220
+ <td>β€”</td>
221
+ <td>β€”</td>
222
+ </tr>
223
+ <tr>
224
+ <td>BNB</td>
225
+ <td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/bnb/minicpm-v4_5_bnb_quantize.md">BNB Doc</a></td>
226
+ <td>β€”</td>
227
+ <td>β€”</td>
228
+ <td>β€”</td>
229
+ </tr>
230
+ <tr>
231
+ <td>AWQ</td>
232
+ <td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/quantization/awq/minicpm-v4_5_awq_quantize.md">AWQ Doc</a></td>
233
+ <td>β€”</td>
234
+ <td>β€”</td>
235
+ <td>β€”</td>
236
+ </tr>
237
+ <tr>
238
+ <td>Demos</td>
239
+ <td>Gradio Demo</td>
240
+ <td><a href="https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/web_demo/gradio/README.md">Gradio Demo Doc</a></td>
241
+ <td>β€”</td>
242
+ <td>β€”</td>
243
+ <td>β€”</td>
244
+ </tr>
245
+ </tbody>
246
+ </table>
247
+
248
+ > Note: If you'd like us to prioritize support for another open-source framework, please let us know via this [short form](https://docs.google.com/forms/d/e/1FAIpQLSdyTUrOPBgWqPexs3ORrg47ZcZ1r4vFQaA4ve2iA7L9sMfMWw/viewform).
249
 
250
  ## Usage
251
 
 
443
  msgs = [{'role': 'user', 'content': [image1, image2, question]}]
444
 
445
  answer = model.chat(
446
+ msgs=msgs,
447
+ tokenizer=tokenizer
448
+ )
449
+ print(answer)
450
+ ```
451
+ </details>
452
+
453
+
454
+ #### In-context few-shot learning
455
+ <details>
456
+ <summary> Click to view Python code running MiniCPM-V 4.5 with few-shot input. </summary>
457
+
458
+ ```python
459
+ import torch
460
+ from PIL import Image
461
+ from transformers import AutoModel, AutoTokenizer
462
+
463
+ model = AutoModel.from_pretrained('openbmb/MiniCPM-V-4_5', trust_remote_code=True,
464
+ attn_implementation='sdpa', torch_dtype=torch.bfloat16)
465
+ model = model.eval().cuda()
466
+ tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-4_5', trust_remote_code=True)
467
+
468
+ question = "production date"
469
+ image1 = Image.open('example1.jpg').convert('RGB')
470
+ answer1 = "2023.08.04"
471
+ image2 = Image.open('example2.jpg').convert('RGB')
472
+ answer2 = "2007.04.24"
473
+ image_test = Image.open('test.jpg').convert('RGB')
474
+
475
+ msgs = [
476
+ {'role': 'user', 'content': [image1, question]}, {'role': 'assistant', 'content': [answer1]},
477
+ {'role': 'user', 'content': [image2, question]}, {'role': 'assistant', 'content': [answer2]},
478
+ {'role': 'user', 'content': [image_test, question]}
479
+ ]
480
+
481
+ answer = model.chat(
482
  msgs=msgs,
483
  tokenizer=tokenizer
484
  )