Andres Emanuel Jara's picture

Andres Emanuel Jara

Fruei
ยท

AI & ML interests

None yet

Recent Activity

reacted to merve's post with ๐Ÿค 8 days ago
Google's SigLIP is another alternative to openai's CLIP, and it just got merged to ๐Ÿค—transformers and it's super easy to use! To celebrate this, I have created a repository including notebooks and bunch of Spaces on various SigLIP based projects ๐Ÿฅณ Search for art ๐Ÿ‘‰ https://huggingface.co/spaces/merve/draw_to_search_art Compare SigLIP with CLIP ๐Ÿ‘‰ https://huggingface.co/spaces/merve/compare_clip_siglip How does SigLIP work? SigLIP an vision-text pre-training technique based on contrastive learning. It jointly trains an image encoder and text encoder such that the dot product of embeddings are most similar for the appropriate text-image pairs The image below is taken from CLIP, where this contrastive pre-training takes place with softmax, but SigLIP replaces softmax with sigmoid. ๐Ÿ“Ž Highlights from the paper on why you should use it โœจ ๐Ÿ–ผ๏ธ๐Ÿ“ Authors used medium sized B/16 ViT for image encoder and B-sized transformer for text encoder ๐Ÿ˜ More performant than CLIP on zero-shot ๐Ÿ—ฃ๏ธ Authors trained a multilingual model too! โšก๏ธ Super efficient, sigmoid is enabling up to 1M items per batch, but the authors chose 32k because the performance saturates after that It's super easy to use thanks to transformers ๐Ÿ‘‡ ```python from transformers import pipeline from PIL import Image import requests # load pipe image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip-base-patch16-256-i18n") # load image url = 'http://images.cocodataset.org/val2017/000000039769.jpg' image = Image.open(requests.get(url, stream=True).raw) # inference outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"]) outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs] print(outputs) ``` For all the SigLIP notebooks on similarity search and indexing, you can check this [repository](https://github.com/merveenoyan/siglip) out. ๐Ÿค—
reacted to merve's post with โค๏ธ 8 days ago
Google's SigLIP is another alternative to openai's CLIP, and it just got merged to ๐Ÿค—transformers and it's super easy to use! To celebrate this, I have created a repository including notebooks and bunch of Spaces on various SigLIP based projects ๐Ÿฅณ Search for art ๐Ÿ‘‰ https://huggingface.co/spaces/merve/draw_to_search_art Compare SigLIP with CLIP ๐Ÿ‘‰ https://huggingface.co/spaces/merve/compare_clip_siglip How does SigLIP work? SigLIP an vision-text pre-training technique based on contrastive learning. It jointly trains an image encoder and text encoder such that the dot product of embeddings are most similar for the appropriate text-image pairs The image below is taken from CLIP, where this contrastive pre-training takes place with softmax, but SigLIP replaces softmax with sigmoid. ๐Ÿ“Ž Highlights from the paper on why you should use it โœจ ๐Ÿ–ผ๏ธ๐Ÿ“ Authors used medium sized B/16 ViT for image encoder and B-sized transformer for text encoder ๐Ÿ˜ More performant than CLIP on zero-shot ๐Ÿ—ฃ๏ธ Authors trained a multilingual model too! โšก๏ธ Super efficient, sigmoid is enabling up to 1M items per batch, but the authors chose 32k because the performance saturates after that It's super easy to use thanks to transformers ๐Ÿ‘‡ ```python from transformers import pipeline from PIL import Image import requests # load pipe image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip-base-patch16-256-i18n") # load image url = 'http://images.cocodataset.org/val2017/000000039769.jpg' image = Image.open(requests.get(url, stream=True).raw) # inference outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"]) outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs] print(outputs) ``` For all the SigLIP notebooks on similarity search and indexing, you can check this [repository](https://github.com/merveenoyan/siglip) out. ๐Ÿค—
reacted to merve's post with ๐Ÿ‘ 8 days ago
Google's SigLIP is another alternative to openai's CLIP, and it just got merged to ๐Ÿค—transformers and it's super easy to use! To celebrate this, I have created a repository including notebooks and bunch of Spaces on various SigLIP based projects ๐Ÿฅณ Search for art ๐Ÿ‘‰ https://huggingface.co/spaces/merve/draw_to_search_art Compare SigLIP with CLIP ๐Ÿ‘‰ https://huggingface.co/spaces/merve/compare_clip_siglip How does SigLIP work? SigLIP an vision-text pre-training technique based on contrastive learning. It jointly trains an image encoder and text encoder such that the dot product of embeddings are most similar for the appropriate text-image pairs The image below is taken from CLIP, where this contrastive pre-training takes place with softmax, but SigLIP replaces softmax with sigmoid. ๐Ÿ“Ž Highlights from the paper on why you should use it โœจ ๐Ÿ–ผ๏ธ๐Ÿ“ Authors used medium sized B/16 ViT for image encoder and B-sized transformer for text encoder ๐Ÿ˜ More performant than CLIP on zero-shot ๐Ÿ—ฃ๏ธ Authors trained a multilingual model too! โšก๏ธ Super efficient, sigmoid is enabling up to 1M items per batch, but the authors chose 32k because the performance saturates after that It's super easy to use thanks to transformers ๐Ÿ‘‡ ```python from transformers import pipeline from PIL import Image import requests # load pipe image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip-base-patch16-256-i18n") # load image url = 'http://images.cocodataset.org/val2017/000000039769.jpg' image = Image.open(requests.get(url, stream=True).raw) # inference outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"]) outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs] print(outputs) ``` For all the SigLIP notebooks on similarity search and indexing, you can check this [repository](https://github.com/merveenoyan/siglip) out. ๐Ÿค—
View all activity

Organizations

None yet

Fruei's activity

reacted to merve's post with ๐Ÿคโค๏ธ๐Ÿ‘ 8 days ago
view post
Post
Google's SigLIP is another alternative to openai's CLIP, and it just got merged to ๐Ÿค—transformers and it's super easy to use!
To celebrate this, I have created a repository including notebooks and bunch of Spaces on various SigLIP based projects ๐Ÿฅณ
Search for art ๐Ÿ‘‰ merve/draw_to_search_art
Compare SigLIP with CLIP ๐Ÿ‘‰ merve/compare_clip_siglip

How does SigLIP work?
SigLIP an vision-text pre-training technique based on contrastive learning. It jointly trains an image encoder and text encoder such that the dot product of embeddings are most similar for the appropriate text-image pairs
The image below is taken from CLIP, where this contrastive pre-training takes place with softmax, but SigLIP replaces softmax with sigmoid. ๐Ÿ“Ž

Highlights from the paper on why you should use it โœจ
๐Ÿ–ผ๏ธ๐Ÿ“ Authors used medium sized B/16 ViT for image encoder and B-sized transformer for text encoder
๐Ÿ˜ More performant than CLIP on zero-shot
๐Ÿ—ฃ๏ธ Authors trained a multilingual model too!
โšก๏ธ Super efficient, sigmoid is enabling up to 1M items per batch, but the authors chose 32k because the performance saturates after that

It's super easy to use thanks to transformers ๐Ÿ‘‡
from transformers import pipeline
from PIL import Image
import requests

# load pipe
image_classifier = pipeline(task="zero-shot-image-classification", model="google/siglip-base-patch16-256-i18n")

# load image
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

# inference
outputs = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"])
outputs = [{"score": round(output["score"], 4), "label": output["label"] } for output in outputs]
print(outputs)

For all the SigLIP notebooks on similarity search and indexing, you can check this [repository](https://github.com/merveenoyan/siglip) out. ๐Ÿค—
ยท
replied to merve's post 8 days ago
view reply

Is it possible to generate embeddings with SigLIP? i mean only one vector to beeing used for a vector search like in bigquery