microsoft
/

Phi-4-multimodal-instruct-onnx

@@ -19,13 +19,32 @@ tags:
 ### Introduction
-This repository hosts the optimized versions Phi4  multi modal model  to accelerate inference with ONNX Runtime. This repository hosts the optimized versions Phi4  multimodal model  to accelerate inference with ONNX Runtime. Optimized models are published here in ONNX format to run with ONNX Runtime on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.
-Here are some of the optimized configurations we have added:
-1. ONNX model for int4 CUDA and DML GPU devices using int4 quantization via RTN.
-You can see how to run examples  with ORT GenAI <to add link>
 The performance of the text component is similar the [phi4 mini ONNX models] (https://huggingface.co/microsoft/Phi-4-mini-instruct-onnx/blob/main/README.md)

 ### Introduction
+ONNX version of Phi4 multi modal to accelerate inference with ONNX Runtime.
+This modal is quantized to int4 precision and runs on CUDA devices.
+To run this model with ONNX Runtime:
+Download the model:
+```bash
+git clone https://huggingface.co/microsoft/Phi-4-multimodal-instruct-onnx
+```
+Download the script to run the model:
+```bash
+curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/refs/heads/main/examples/python/phi4-mm.py -o phi4-mm.py
+```
+Run the script
+```bash
+python phi4-mm.py -m Phi-4-multimodal-instruct-onnx/cuda/cuda-int4-rtn-blocksize-32 -e cuda
+```
+You will be prompted for images, audio files and a prompt.
 The performance of the text component is similar the [phi4 mini ONNX models] (https://huggingface.co/microsoft/Phi-4-mini-instruct-onnx/blob/main/README.md)