PaliGemma 2 ONNX doesn't support object detection?

by NSTiwari - opened Dec 26, 2024

Dec 26, 2024

•

edited Dec 26, 2024

Hi, thanks for sharing the ONNX weights for PaliGemma 2. While it works well for image captioning, I tried several prompts for object detection using the detect keyword in the prompt.
Eg: detect person was one of the prompts, but the response was null.

Are the converted model weights compatible only with captioning tasks?

Xenova

ONNX Community org Dec 26, 2024

Hmm, it should work. Could you share the code you are using?

Xenova

ONNX Community org Dec 26, 2024

Also, can you confirm the original (pytorch) version works correctly for your image/prompt?

NSTiwari

Dec 26, 2024

@Xenova :Okay, after experimenting with various different prompts, I was able to get the bounding box coordinates. Unlike the original PaliGemma 2 weights where a simple <image>detect person would work, I had to specifically provide this prompt <image>detect bounding box of person to make it work.

NSTiwari

Dec 29, 2024

•

edited Dec 29, 2024

Hi @Xenova , is it possible to run this using Vanilla JS by loading Transformers.js via a CDN?
I get the following error:

import { AutoProcessor, PaliGemmaForConditionalGeneration } from 'https://cdn.jsdelivr.net/npm/@huggingface/[email protected]';

Here's how I'm loading it.

biswajitdevsarma

ONNX Community org 2 days ago

How are you converting the model to onnx? Optimum is not supporting image-text-to-text task. Please help.

$optimum-cli export onnx --model google/paligemma-3b-pt-224 paligemma-3b-pt-224_onnx/
KeyError: “Unknown task: image-text-to-text.

I tried specifying one of the existing task image-to-text. But that throws also another error
$optimum-cli export onnx --model google/paligemma-3b-pt-224 --task image-to-text paligemma-3b-pt-224_onnx/

ValueError: Trying to export a paligemma model, that is a custom or unsupported architecture, but no custom onnx configuration was passed as custom_onnx_configs. Please refer to Export a model to ONNX with optimum.exporters.onnx for an example on how to export custom models. Please open an issue at GitHub · Where software is built if you would like the model type paligemma to be supported natively in the ONNX export.

Xenova

ONNX Community org 1 day ago

paligemma2 uses a custom conversion script, which I have added here: https://github.com/huggingface/transformers.js/issues/1126#issuecomment-2575525385

Hope that helps!

NSTiwari

1 day ago

@Xenova : I've commented on the GitHub issue about an error. Could you please check?

RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.

biswajitdevsarma

ONNX Community org about 15 hours ago

@Xenova Thanks . That helps.

NSTiwari

about 15 hours ago

@biswajitdevsarma : Did it work for you?

biswajitdevsarma

ONNX Community org about 15 hours ago

@NSTiwari Conversion to onnx worked. Haven't checked inference using onnx yet.

NSTiwari

about 14 hours ago

@biswajitdevsarma : Do you mind sharing the notebook? When I tried doing the same, I got the above error.

biswajitdevsarma

ONNX Community org about 14 hours ago

•

edited about 14 hours ago

@NSTiwari I used the above code
Just commented the onnx.slim part

Attempt to optimize the model with onnxslim

"""
try:
    onnx_model = onnxslim.slim(temp_model_path)
except Exception as e:
    print(f"Failed to slim {temp_model_path}: {e}")
    onnx_model = onnx.load(temp_model_path)
"""
onnx_model = onnx.load(temp_model_path))

Everything else is same.

NSTiwari

about 14 hours ago

•

edited about 14 hours ago

@biswajitdevsarma
I used the same code too. Maybe, I'm missing some dependencies or compatibility issues with versions. Here's my notebook. Could you please check once? Really appreciate your help.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment