metadata

title: MiniCPM-V-4.5 Multimodal Chat
emoji: 🚀
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.44.0
app_file: app.py
pinned: false
license: apache-2.0

MiniCPM-V-4.5 Multimodal Chat 🚀

A powerful Gradio interface for the MiniCPM-V-4.5 multimodal model - a GPT-4V level MLLM with only 8B parameters!

Features

📸 Image Understanding: Analyze single or multiple images with high-resolution support (up to 1.8M pixels)
🎥 Video Understanding: Process videos with high refresh rate (up to 10 FPS) and efficient compression
📄 Document Parsing: Strong OCR capabilities and PDF document parsing
🧠 Thinking Modes: Choose between fast thinking for efficiency or deep thinking for complex problems
🌍 Multilingual: Support for 30+ languages
⚙️ Customizable: Adjust FPS, context size, temperature, and system prompts

Model Capabilities

MiniCPM-V-4.5 achieves state-of-the-art performance across multiple benchmarks:

Surpasses GPT-4o-latest and Gemini-2.0 Pro on vision-language tasks
Leading OCR performance on OCRBench
Efficient video token compression (96x rate)
Trustworthy behaviors with multilingual support

Usage

Upload: Choose an image or video file
Configure: Adjust settings like FPS (for videos), context size, and temperature
Prompt: Enter your question or use the system prompt for specific instructions
Generate: Click the generate button to get the model's response

Examples

"What objects do you see in this image?"
"Describe the main action happening in this video"
"Read and transcribe any text visible in the image"
"Analyze this image from an artistic perspective"

Technical Details

Architecture: Built on Qwen3-8B and SigLIP2-400M
Parameters: 8B total parameters
Video Processing: 3D-Resampler with temporal understanding
Resolution: Supports images up to 1344x1344 pixels
Efficiency: 4x fewer visual tokens than most MLLMs

License

This model is released under the MiniCPM Model License. Free for academic research and commercial use after registration.

Citation

@article{yao2024minicpm,
  title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone},
  author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
  journal={Nat Commun 16, 5509 (2025)},
  year={2025}
}