PaliGemma 2 ONNX doesn't support object detection?
Hi, thanks for sharing the ONNX weights for PaliGemma 2. While it works well for image captioning, I tried several prompts for object detection using the detect keyword in the prompt.
Eg: detect person was one of the prompts, but the response was null.
Are the converted model weights compatible only with captioning tasks?
Hmm, it should work. Could you share the code you are using?
Also, can you confirm the original (pytorch) version works correctly for your image/prompt?
@Xenova
:Okay, after experimenting with various different prompts, I was able to get the bounding box coordinates. Unlike the original PaliGemma 2 weights where a simple <image>detect person
would work, I had to specifically provide this prompt <image>detect bounding box of person
to make it work.
Hi
@Xenova
, is it possible to run this using Vanilla JS by loading Transformers.js via a CDN?
I get the following error:
import { AutoProcessor, PaliGemmaForConditionalGeneration } from 'https://cdn.jsdelivr.net/npm/@huggingface/[email protected]';
Here's how I'm loading it.
How are you converting the model to onnx? Optimum is not supporting image-text-to-text task. Please help.
$optimum-cli export onnx --model google/paligemma-3b-pt-224 paligemma-3b-pt-224_onnx/
KeyError: “Unknown task: image-text-to-text.
I tried specifying one of the existing task image-to-text. But that throws also another error
$optimum-cli export onnx --model google/paligemma-3b-pt-224 --task image-to-text paligemma-3b-pt-224_onnx/
ValueError: Trying to export a paligemma model, that is a custom or unsupported architecture, but no custom onnx configuration was passed as custom_onnx_configs. Please refer to Export a model to ONNX with optimum.exporters.onnx for an example on how to export custom models. Please open an issue at GitHub · Where software is built if you would like the model type paligemma to be supported natively in the ONNX export.
paligemma2 uses a custom conversion script, which I have added here: https://github.com/huggingface/transformers.js/issues/1126#issuecomment-2575525385
Hope that helps!
@Xenova : I've commented on the GitHub issue about an error. Could you please check?
RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.
@Xenova Thanks . That helps.
@biswajitdevsarma : Did it work for you?
@NSTiwari Conversion to onnx worked. Haven't checked inference using onnx yet.
@biswajitdevsarma : Do you mind sharing the notebook? When I tried doing the same, I got the above error.
@NSTiwari
I used the above code
Just commented the onnx.slim part
Attempt to optimize the model with onnxslim
"""
try:
onnx_model = onnxslim.slim(temp_model_path)
except Exception as e:
print(f"Failed to slim {temp_model_path}: {e}")
onnx_model = onnx.load(temp_model_path)
"""
onnx_model = onnx.load(temp_model_path))
Everything else is same.
@biswajitdevsarma
I used the same code too. Maybe, I'm missing some dependencies or compatibility issues with versions. Here's my notebook. Could you please check once? Really appreciate your help.