PaliGemma 2 ONNX doesn't support object detection?

#1
by NSTiwari - opened

Hi, thanks for sharing the ONNX weights for PaliGemma 2. While it works well for image captioning, I tried several prompts for object detection using the detect keyword in the prompt.
Eg: detect person was one of the prompts, but the response was null.

Are the converted model weights compatible only with captioning tasks?

ONNX Community org

Hmm, it should work. Could you share the code you are using?

ONNX Community org

Also, can you confirm the original (pytorch) version works correctly for your image/prompt?

@Xenova :Okay, after experimenting with various different prompts, I was able to get the bounding box coordinates. Unlike the original PaliGemma 2 weights where a simple <image>detect person would work, I had to specifically provide this prompt <image>detect bounding box of person to make it work.

Hi @Xenova , is it possible to run this using Vanilla JS by loading Transformers.js via a CDN?
I get the following error:

image.png

import { AutoProcessor, PaliGemmaForConditionalGeneration } from 'https://cdn.jsdelivr.net/npm/@huggingface/[email protected]';

Here's how I'm loading it.

ONNX Community org

How are you converting the model to onnx? Optimum is not supporting image-text-to-text task. Please help.

$optimum-cli export onnx --model google/paligemma-3b-pt-224 paligemma-3b-pt-224_onnx/
KeyError: “Unknown task: image-text-to-text.

I tried specifying one of the existing task image-to-text. But that throws also another error
$optimum-cli export onnx --model google/paligemma-3b-pt-224 --task image-to-text paligemma-3b-pt-224_onnx/

ValueError: Trying to export a paligemma model, that is a custom or unsupported architecture, but no custom onnx configuration was passed as custom_onnx_configs. Please refer to Export a model to ONNX with optimum.exporters.onnx for an example on how to export custom models. Please open an issue at GitHub · Where software is built if you would like the model type paligemma to be supported natively in the ONNX export.

ONNX Community org

paligemma2 uses a custom conversion script, which I have added here: https://github.com/huggingface/transformers.js/issues/1126#issuecomment-2575525385

Hope that helps!

@Xenova : I've commented on the GitHub issue about an error. Could you please check?

RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.

@Xenova Thanks . That helps.

@biswajitdevsarma : Did it work for you?

@NSTiwari Conversion to onnx worked. Haven't checked inference using onnx yet.

@biswajitdevsarma : Do you mind sharing the notebook? When I tried doing the same, I got the above error.

@NSTiwari I used the above code
Just commented the onnx.slim part

Attempt to optimize the model with onnxslim

"""
try:
    onnx_model = onnxslim.slim(temp_model_path)
except Exception as e:
    print(f"Failed to slim {temp_model_path}: {e}")
    onnx_model = onnx.load(temp_model_path)
"""
onnx_model = onnx.load(temp_model_path))

Everything else is same.

@biswajitdevsarma
I used the same code too. Maybe, I'm missing some dependencies or compatibility issues with versions. Here's my notebook. Could you please check once? Really appreciate your help.

Sign up or log in to comment