Conversion script (PyTorch to CoreML)

by mrdbourke - opened 13 days ago

Discussion

mrdbourke

13 days ago

Hey there,

Thank you so much for these models as well as the demo iOS app.

I was wondering if you had the code for converting the model from its PyTorch form to CoreML?

I would love to see how it's done so I could potentially do the same for other models.

Thank you,

Daniel

julien-c

6 days ago

Norod78

6 days ago

Well, I did something like this ~8 months ago, you can clone the repo from
https://huggingface.co/Norod78/CoreML-MobileCLIP-S0/tree/main
and use the notebook there

pcuenq

Apple org 5 days ago

That's awesome @Norod78 ! Did you check the compute units assignment to see if it's compatible with the Neural Engine?

Norod78

4 days ago

•

edited 4 days ago

@pcuenq Well, it loads with all and when I print it afterwards it still claims computeUnits = MLComputeUnits(rawValue: 2) question is, how do I know it uses the ANE when doing predictions rather than GPU / CPU ?

Edit 1: It does print numANECores: Unknown aneSubType to the debug console when the text encoder loads, but I could not find any info about it
Edit 2: I used this fork https://github.com/Norod/Queryable-mobileclip which is a port of the original repo from OpenAI clip to MobileClip . Also note that I do the color normalization in the model itself (I had to remove image the normalization code which was done in swift in the original repo). I didn't look at Apple's sample code.

pcuenq

Apple org 4 days ago

The easiest way to determine compatibility is with Xcode performance report, see example for a different model:

It gives you the default device mapping the system would choose, as well as the compatibility.

@FL33TW00D-HF wrote a nice Python CLI as well: https://github.com/FL33TW00D/coremlprofiler

Norod78

4 days ago

•

edited 4 days ago

Very cool! Wasn't mafimilar with this. So basicaly the Image Encoder gets fully mapped to the Neural Engine

But not the text encoder isn't

When performing on device inference, can a model be mapped through the different engines or does it mean that the text encoder will actually only be mapped to the CPU?

@pcuenq

FL33TW00D-HF

4 days ago

@Norod78 The text encoder will be split and run across CPU, GPU and Neural Engine. The split is detailed in the operations list (i.e 5 ops will run on CPU, 1 op on GPU, 102 on NE).

You can specify which compute units you'd like to use: https://apple.github.io/coremltools/source/coremltools.models.html#coremltools.models.model.MLModel.__init__
For example: coremltools.ComputeUnit.CPU_AND_NE

Norod78

4 days ago

@FL33TW00D-HF Cool, good to know, I saw the split but didn't know it can actually inference with different ops running on different devices. Thank you for the clarification.

pcuenq

Apple org 2 days ago

Yes, the system tries to determine an optimal split based on the compatibility of the ops and the hardware the model runs on, but sometimes it's worth experimenting with manual placement as @FL33TW00D-HF said.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment