parinitarahi natke commited on
Commit
dcbafef
·
verified ·
1 Parent(s): ead5875

Update README.md (#1)

Browse files

- Update README.md (30b599d3c91c8108275353b92734873ef2fba5f6)


Co-authored-by: Nat Kershaw <[email protected]>

Files changed (1) hide show
  1. README.md +23 -4
README.md CHANGED
@@ -19,13 +19,32 @@ tags:
19
 
20
  ### Introduction
21
 
22
- This repository hosts the optimized versions Phi4 multi modal model to accelerate inference with ONNX Runtime. This repository hosts the optimized versions Phi4 multimodal model to accelerate inference with ONNX Runtime. Optimized models are published here in ONNX format to run with ONNX Runtime on CPU and GPU across devices, including server platforms, Windows, Linux and Mac desktops, and mobile CPUs, with the precision best suited to each of these targets.
23
 
24
- Here are some of the optimized configurations we have added:
25
 
26
- 1. ONNX model for int4 CUDA and DML GPU devices using int4 quantization via RTN.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
- You can see how to run examples with ORT GenAI <to add link>
29
 
30
  The performance of the text component is similar the [phi4 mini ONNX models] (https://huggingface.co/microsoft/Phi-4-mini-instruct-onnx/blob/main/README.md)
31
 
 
19
 
20
  ### Introduction
21
 
22
+ ONNX version of Phi4 multi modal to accelerate inference with ONNX Runtime.
23
 
24
+ This modal is quantized to int4 precision and runs on CUDA devices.
25
 
26
+ To run this model with ONNX Runtime:
27
+
28
+ Download the model:
29
+
30
+ ```bash
31
+ git clone https://huggingface.co/microsoft/Phi-4-multimodal-instruct-onnx
32
+ ```
33
+
34
+ Download the script to run the model:
35
+
36
+ ```bash
37
+ curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/refs/heads/main/examples/python/phi4-mm.py -o phi4-mm.py
38
+ ```
39
+
40
+ Run the script
41
+
42
+ ```bash
43
+ python phi4-mm.py -m Phi-4-multimodal-instruct-onnx/cuda/cuda-int4-rtn-blocksize-32 -e cuda
44
+ ```
45
+
46
+ You will be prompted for images, audio files and a prompt.
47
 
 
48
 
49
  The performance of the text component is similar the [phi4 mini ONNX models] (https://huggingface.co/microsoft/Phi-4-mini-instruct-onnx/blob/main/README.md)
50