Pavel Iakubovskii

qubvel-hf

AI & ML interests

Computer Vision models

Recent Activity

Organizations

Hugging Face's profile picture PyTorch Image Models's profile picture Peking University's profile picture Hugging Face Internal Testing Organization's profile picture Huggingface Projects's profile picture Hugging Face OSS Metrics's profile picture Hugging Face for Computer Vision's profile picture kotol's profile picture yorg's profile picture CVPR2024's profile picture Hugging Face Discord Community's profile picture nltpt's profile picture s0409's profile picture Segmentation Models Pytorch's profile picture smp-test's profile picture University of Sydney's profile picture s0225's profile picture ETH Zurich - Computer Vision and Geometry Lab's profile picture

qubvel-hf's activity

upvoted an article 3 days ago
view article
Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

By ariG23498 and 3 others
299
upvoted an article 12 days ago
view article
Article

SmolVLM2: Bringing Video Understanding to Every Device

209
reacted to clem's post with 🔥 12 days ago
view post
Post
5888
Super happy to welcome Nvidia as our latest enterprise hub customer. They have almost 2,000 team members using Hugging Face, and close to 20,000 followers of their org. Can't wait to see what they'll open-source for all of us in the coming months!

Nvidia's org: https://huggingface.co/nvidia
Enterprise hub: https://huggingface.co/enterprise
New activity in facebook/sam-vit-large 15 days ago

Update code snippet

#6 opened 15 days ago by
qubvel-hf
New activity in facebook/sam-vit-huge 15 days ago

Update code snippet

#11 opened 15 days ago by
qubvel-hf
New activity in facebook/sam-vit-base 15 days ago

Update code snippet

#8 opened 15 days ago by
qubvel-hf
view reply

btw, also observed "." and capitalized template influences the confidence quite a bit

view reply

Not sure what's up as I'm not familiar with this codebase (and no time to dig in), but for siglip what you're supposed to do is do sigmoid(zimg @ ztxt * temperature + bias)

from what you describe, I would bet the bias and/or temperature are missing?
The ground-truth reference code is https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP2_demo.ipynb

Hey @giffmana , temperature and bias are applied under the hood, see

Siglip
https://github.com/huggingface/transformers/blob/17792556b21b4da0dbb9e4b59b39fb34aae4047c/src/transformers/models/siglip/modeling_siglip.py#L1411-L1417

Siglip2
https://github.com/huggingface/transformers/blob/17792556b21b4da0dbb9e4b59b39fb34aae4047c/src/transformers/models/siglip2/modeling_siglip2.py#L1459-L1465