FlagRelease
/

MiniCPM_o_2.6-FlagOS-NVIDIA

Safetensors

minicpmo

custom_code

Model card Files Files and versions

xet

Community

YummyYum commited on Jul 21

Commit

89dbebb

verified ·

1 Parent(s): 13fd13e

Upload README.md

Browse files

Files changed (1) hide show

README.md +38 -75

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Introduction
-MiniCPM_o_2.6-FlagOS-NVIDIA  provides an all-in-one deployment solution, enabling execution of MiniCPM_o_2.6 on NVIDIA GPUs. As the first-generation release for the NVIDIA-H100, this package delivers two key features:
 1. Comprehensive Integration:
    - Integrated with FlagScale (https://github.com/FlagOpen/FlagScale).
@@ -29,13 +29,7 @@ We use a variety of Triton-implemented operation kernels—approximately 70%—t
 - Most Triton kernels are provided by FlagGems (https://github.com/FlagOpen/FlagGems). You can enable FlagGems kernels by setting the environment variable USE_FLAGGEMS. For more details, please refer to the "How to Run Locally" section.
-- Also included are Triton kernels from vLLM.
-# Bundle Download
-|             | Usage                                                  | Nvidia                                                       |
-| ----------- | ------------------------------------------------------ | ------------------------------------------------------------ |
-| Basic Image | basic software environment that supports model running | 'docker pull flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-nvidia |
 # Evaluation Results
@@ -58,96 +52,65 @@ We use a variety of Triton-implemented operation kernels—approximately 70%—t
 ## 📌 Getting Started
-### Download open-source weights
-```
-pip install modelscope
-modelscope download --model <Model Name> --local_dir <Cache Path>
-```
-### Download the FlagOS image
-```
-docker pull <IMAGE>
-```
-### Start the inference service
-```
-docker run -itd --name flagrelease_nv  --privileged --gpus all --net=host --ipc=host --device=/dev/infiniband --shm-size 512g --ulimit memlock=-1 -v <CKPT_PATH>:<CKPT_PATH> flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-nvidia /bin/bash
 docker exec -it flagrelease_nv /bin/bash
 conda activate flagscale-inference
 ```
 ### Download and install FlagGems
-```
 git clone https://github.com/FlagOpen/FlagGems.git
 cd FlagGems
-pip install .
 cd ../
 ```
-### Modify the configuration
-```
-cd FlagScale/examples/minicpm_o_2.6/conf
-# Modify the configuration in config_minicpm_o_2.6.yaml
-defaults:
-  - _self_
-  - serve: minicpm_o_2.6
-experiment:
-  exp_name: minicpm_o_2.6
-  exp_dir: outputs/${experiment.exp_name}
-  task:
-    type: serve
-  deploy:
-    use_fs_serve: false
-  runner:
-    ssh_port: 22
-  envs:
-    CUDA_DEVICE_MAX_CONNECTIONS: 1
-  cmds:
-    before_start: source /root/miniconda3/bin/activate flagscale-inference && export USE_FLAGGEMS=1
-action: run
-hydra:
-  run:
-    dir: ${experiment.exp_dir}/hydra
 ```
-```
-cd FlagScale/examples/minicpm_o_2.6/conf/serve
-# Modify the configuration in minicpm_o_2.6.yaml
-- serve_id: vllm_model
-  engine: vllm
-  engine_args:
-    model: /models/MiniCPM_o_2 # path of weight of DeepSeek-R1-Distill-Qwen-32B
-    served_model_name: minicpmo26-flagos
-    tensor_parallel_size: 1
-    pipeline_parallel_size: 1
-    gpu_memory_utilization: 0.9
-    max_num_seqs: 256
-    limit_mm_per_prompt: image=18
-    port: 9010
-    trust_remote_code: true
-    enable_chunked_prefill: true
-```
-```
 # install flagscale
-cd FlagScale/
 pip install .
-#【Verifiable on a single machine】
-```
-### Serve
-```
-flagscale serve <Model>
 ```
 # Contributing
@@ -169,4 +132,4 @@ send "FlagRelease"
 # License
-This project and related model weights are licensed under the MIT License.

 # Introduction
+MiniCPM_o_2.6-FlagOS-NVIDIA  provides an all-in-one deployment solution, enabling execution of MiniCPM_o_2.6 on NVIDIA GPUs. As the first-generation release for the NVIDIA-H100, this package delivers three key features:
 1. Comprehensive Integration:
    - Integrated with FlagScale (https://github.com/FlagOpen/FlagScale).
 - Most Triton kernels are provided by FlagGems (https://github.com/FlagOpen/FlagGems). You can enable FlagGems kernels by setting the environment variable USE_FLAGGEMS. For more details, please refer to the "How to Run Locally" section.
+- Also included are Triton kernels from vLLM, including fused MoE.
 # Evaluation Results
 ## 📌 Getting Started
+### Environment Setup
+```bash
+# install FlagScale
+git clone https://github.com/FlagOpen/FlagScale.git
+cd FlagScale
+pip install .
+# download image and ckpt
+flagscale pull --image docker pull flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-nvidia --ckpt https://www.modelscope.cn/models/FlagRelease/MiniCPM_o_2.6-FlagOS-Nvidia.git --ckpt-path /nfs/MiniCPM_o_2.6
+# Note: For security reasons, this image does not have passwordless configuration. In multi-machine scenarios, you need to configure passwordless access for the image yourself.
+# build and enter the container
+docker run -itd --name flagrelease_nv --privileged --gpus all --net=host --ipc=host --device=/dev/infiniband  --shm-size 512g --ulimit memlock=-1 -v /nfs:/nfs flagrelease-registry.cn-beijing.cr.aliyuncs.com/flagrelease/flagrelease:deepseek-flagos-nvidia /bin/bash
 docker exec -it flagrelease_nv /bin/bash
 conda activate flagscale-inference
 ```
 ### Download and install FlagGems
+```bash
 git clone https://github.com/FlagOpen/FlagGems.git
 cd FlagGems
+pip install ./ --no-deps
 cd ../
 ```
+### Download FlagScale and build vllm
+```bash
+git clone https://github.com/FlagOpen/FlagScale.git
+cd FlagScale/
+git checkout ae85925798358d95050773dfa66680efdb0c2b28
+cd vllm
+pip install .
+cd ../
 ```
+### Serve
+```bash
+# config the minicpm_o_2.6 yaml
+FlagScale/
+├── examples/
+│   └── minicpm_o_2.6/
+│       └── conf/
+│           └── config_minicpm_o_2.6.yaml # set hostfile and ssh_port(optional), if it is passwordless access between containers, the docker field needs to be removed
+│           └── serve/
+│               └── minicpm_o_2.6.yaml # set model parameters and server port
 # install flagscale
 pip install .
+# serve
+flagscale serve minicpm_o_2.6
 ```
 # Contributing
 # License
+The weights of this model are based on OpenBMB/MiniCPM-o-2_6 and are open-sourced under the Apache 2.0 License: https://www.apache.org/licenses/LICENSE-2.0.txt.