Core ML

Conversion script (PyTorch to CoreML)

#1
by mrdbourke - opened

Hey there,

Thank you so much for these models as well as the demo iOS app.

I was wondering if you had the code for converting the model from its PyTorch form to CoreML?

I would love to see how it's done so I could potentially do the same for other models.

Thank you,

Daniel

Well, I did something like this ~8 months ago, you can clone the repo from
https://huggingface.co/Norod78/CoreML-MobileCLIP-S0/tree/main
and use the notebook there

Apple org

That's awesome @Norod78 ! Did you check the compute units assignment to see if it's compatible with the Neural Engine?

@pcuenq Well, it loads with all and when I print it afterwards it still claims computeUnits = MLComputeUnits(rawValue: 2) question is, how do I know it uses the ANE when doing predictions rather than GPU / CPU ?

Edit 1: It does print numANECores: Unknown aneSubType to the debug console when the text encoder loads, but I could not find any info about it
Edit 2: I used this fork https://github.com/Norod/Queryable-mobileclip which is a port of the original repo from OpenAI clip to MobileClip . Also note that I do the color normalization in the model itself (I had to remove image the normalization code which was done in swift in the original repo). I didn't look at Apple's sample code.

Apple org

The easiest way to determine compatibility is with Xcode performance report, see example for a different model:

performance-ios-all.png

It gives you the default device mapping the system would choose, as well as the compatibility.

@FL33TW00D-HF wrote a nice Python CLI as well: https://github.com/FL33TW00D/coremlprofiler

Very cool! Wasn't mafimilar with this. So basicaly the Image Encoder gets fully mapped to the Neural Engine

Screenshot 2025-02-08 at 14.08.15.png

But not the text encoder isn't

Screenshot 2025-02-08 at 14.09.24.png

When performing on device inference, can a model be mapped through the different engines or does it mean that the text encoder will actually only be mapped to the CPU?

@pcuenq

@Norod78 The text encoder will be split and run across CPU, GPU and Neural Engine. The split is detailed in the operations list (i.e 5 ops will run on CPU, 1 op on GPU, 102 on NE).

You can specify which compute units you'd like to use: https://apple.github.io/coremltools/source/coremltools.models.html#coremltools.models.model.MLModel.__init__
For example: coremltools.ComputeUnit.CPU_AND_NE

@FL33TW00D-HF Cool, good to know, I saw the split but didn't know it can actually inference with different ops running on different devices. Thank you for the clarification.

Apple org

Yes, the system tries to determine an optimal split based on the compatibility of the ops and the hardware the model runs on, but sometimes it's worth experimenting with manual placement as @FL33TW00D-HF said.

Sign up or log in to comment