Andres Emanuel Jara's picture

Andres Emanuel Jara

Fruei

·

fruei

AI & ML interests

None yet

Recent Activity

reacted to merve's post with 🤝 8 days ago

Google's SigLIP is another alternative to openai's CLIP, and it just got merged to 🤗transformers and it's super easy to use! To celebrate this, I have created a repository including notebooks and bunch of Spaces on various SigLIP based projects 🥳 Search for art 👉 https://huggingface.co/spaces/merve/draw_to_search_art Compare SigLIP with CLIP 👉 https://huggingface.co/spaces/merve/compare_clip_siglip How does SigLIP work? SigLIP an vision-text pre-training technique based on contrastive learning. It jointly trains an image encoder and text encoder such that the dot product of embeddings are most similar for the appropriate text-image pairs The image below is taken from CLIP, where this contrastive pre-training takes place with softmax, but SigLIP replaces softmax with sigmoid. 📎 Highlights from the paper on why you should use it ✨ 🖼️📝 Authors used medium sized B/16 ViT for image encoder and B-sized transformer for text encoder 😍 More performant than CLIP on zero-shot 🗣️ Authors trained a multilingual model too! ⚡️ Super efficient, sigmoid is enabling up to 1M items per batch, but the authors chose 32k because the performance saturates after that It's super easy to use thanks to transformers 👇 ```python from transformers import pipeline from PIL import Image import requests # load pipe image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip-base-patch16-256-i18n") # load image url = 'http://images.cocodataset.org/val2017/000000039769.jpg' image = Image.open(requests.get(url, stream=True).raw) # inference outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"]) outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs] print(outputs) ``` For all the SigLIP notebooks on similarity search and indexing, you can check this [repository](https://github.com/merveenoyan/siglip) out. 🤗

reacted to merve's post with ❤️ 8 days ago

Google's SigLIP is another alternative to openai's CLIP, and it just got merged to 🤗transformers and it's super easy to use! To celebrate this, I have created a repository including notebooks and bunch of Spaces on various SigLIP based projects 🥳 Search for art 👉 https://huggingface.co/spaces/merve/draw_to_search_art Compare SigLIP with CLIP 👉 https://huggingface.co/spaces/merve/compare_clip_siglip How does SigLIP work? SigLIP an vision-text pre-training technique based on contrastive learning. It jointly trains an image encoder and text encoder such that the dot product of embeddings are most similar for the appropriate text-image pairs The image below is taken from CLIP, where this contrastive pre-training takes place with softmax, but SigLIP replaces softmax with sigmoid. 📎 Highlights from the paper on why you should use it ✨ 🖼️📝 Authors used medium sized B/16 ViT for image encoder and B-sized transformer for text encoder 😍 More performant than CLIP on zero-shot 🗣️ Authors trained a multilingual model too! ⚡️ Super efficient, sigmoid is enabling up to 1M items per batch, but the authors chose 32k because the performance saturates after that It's super easy to use thanks to transformers 👇 ```python from transformers import pipeline from PIL import Image import requests # load pipe image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip-base-patch16-256-i18n") # load image url = 'http://images.cocodataset.org/val2017/000000039769.jpg' image = Image.open(requests.get(url, stream=True).raw) # inference outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"]) outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs] print(outputs) ``` For all the SigLIP notebooks on similarity search and indexing, you can check this [repository](https://github.com/merveenoyan/siglip) out. 🤗

reacted to merve's post with 👍 8 days ago

Google's SigLIP is another alternative to openai's CLIP, and it just got merged to 🤗transformers and it's super easy to use! To celebrate this, I have created a repository including notebooks and bunch of Spaces on various SigLIP based projects 🥳 Search for art 👉 https://huggingface.co/spaces/merve/draw_to_search_art Compare SigLIP with CLIP 👉 https://huggingface.co/spaces/merve/compare_clip_siglip How does SigLIP work? SigLIP an vision-text pre-training technique based on contrastive learning. It jointly trains an image encoder and text encoder such that the dot product of embeddings are most similar for the appropriate text-image pairs The image below is taken from CLIP, where this contrastive pre-training takes place with softmax, but SigLIP replaces softmax with sigmoid. 📎 Highlights from the paper on why you should use it ✨ 🖼️📝 Authors used medium sized B/16 ViT for image encoder and B-sized transformer for text encoder 😍 More performant than CLIP on zero-shot 🗣️ Authors trained a multilingual model too! ⚡️ Super efficient, sigmoid is enabling up to 1M items per batch, but the authors chose 32k because the performance saturates after that It's super easy to use thanks to transformers 👇 ```python from transformers import pipeline from PIL import Image import requests # load pipe image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip-base-patch16-256-i18n") # load image url = 'http://images.cocodataset.org/val2017/000000039769.jpg' image = Image.open(requests.get(url, stream=True).raw) # inference outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"]) outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs] print(outputs) ``` For all the SigLIP notebooks on similarity search and indexing, you can check this [repository](https://github.com/merveenoyan/siglip) out. 🤗

View all activity

Organizations

None yet

Fruei's activity

reacted to merve's post with 🤝❤️👍 8 days ago

Post

Google's SigLIP is another alternative to openai's CLIP, and it just got merged to 🤗transformers and it's super easy to use!
To celebrate this, I have created a repository including notebooks and bunch of Spaces on various SigLIP based projects 🥳
Search for art 👉 merve/draw_to_search_art
Compare SigLIP with CLIP 👉 merve/compare_clip_siglip

How does SigLIP work?
SigLIP an vision-text pre-training technique based on contrastive learning. It jointly trains an image encoder and text encoder such that the dot product of embeddings are most similar for the appropriate text-image pairs
The image below is taken from CLIP, where this contrastive pre-training takes place with softmax, but SigLIP replaces softmax with sigmoid. 📎

Highlights from the paper on why you should use it ✨
🖼️📝 Authors used medium sized B/16 ViT for image encoder and B-sized transformer for text encoder
😍 More performant than CLIP on zero-shot
🗣️ Authors trained a multilingual model too!
⚡️ Super efficient, sigmoid is enabling up to 1M items per batch, but the authors chose 32k because the performance saturates after that

It's super easy to use thanks to transformers 👇

from transformers import pipeline
from PIL import Image
import requests

# load pipe
image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip-base-patch16-256-i18n")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"])
outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs]
print(outputs)

For all the SigLIP notebooks on similarity search and indexing, you can check this [repository](https://github.com/merveenoyan/siglip) out. 🤗

3 replies

·

replied to merve's post 8 days ago

Is it possible to generate embeddings with SigLIP? i mean only one vector to beeing used for a vector search like in bigquery