--- title: MiniCPM-V-4.5 Multimodal Chat emoji: 🚀 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.44.0 app_file: app.py pinned: false license: apache-2.0 --- # MiniCPM-V-4.5 Multimodal Chat 🚀 A powerful Gradio interface for the MiniCPM-V-4.5 multimodal model - a GPT-4V level MLLM with only 8B parameters! ## Features - 📸 **Image Understanding**: Analyze single or multiple images with high-resolution support (up to 1.8M pixels) - 🎥 **Video Understanding**: Process videos with high refresh rate (up to 10 FPS) and efficient compression - 📄 **Document Parsing**: Strong OCR capabilities and PDF document parsing - 🧠 **Thinking Modes**: Choose between fast thinking for efficiency or deep thinking for complex problems - 🌍 **Multilingual**: Support for 30+ languages - ⚙️ **Customizable**: Adjust FPS, context size, temperature, and system prompts ## Model Capabilities MiniCPM-V-4.5 achieves state-of-the-art performance across multiple benchmarks: - Surpasses GPT-4o-latest and Gemini-2.0 Pro on vision-language tasks - Leading OCR performance on OCRBench - Efficient video token compression (96x rate) - Trustworthy behaviors with multilingual support ## Usage 1. **Upload**: Choose an image or video file 2. **Configure**: Adjust settings like FPS (for videos), context size, and temperature 3. **Prompt**: Enter your question or use the system prompt for specific instructions 4. **Generate**: Click the generate button to get the model's response ## Examples - "What objects do you see in this image?" - "Describe the main action happening in this video" - "Read and transcribe any text visible in the image" - "Analyze this image from an artistic perspective" ## Technical Details - **Architecture**: Built on Qwen3-8B and SigLIP2-400M - **Parameters**: 8B total parameters - **Video Processing**: 3D-Resampler with temporal understanding - **Resolution**: Supports images up to 1344x1344 pixels - **Efficiency**: 4x fewer visual tokens than most MLLMs ## License This model is released under the MiniCPM Model License. Free for academic research and commercial use after registration. ## Citation ```bibtex @article{yao2024minicpm, title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone}, author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others}, journal={Nat Commun 16, 5509 (2025)}, year={2025} } ```