ViT patch size?

by bradneuberg - opened 7 days ago

7 days ago

Thanks for this model! I assume it is using a Vision Transformer? If so what is the visual patch size of patches in the ViT (16x16, 32x32, etc)? In my own experiments I’ve found the ViT patch size to have a large role in how well small objects are retained in CLIP/SigLIP embeddings.

lcybuaa

Owner 7 days ago

patch size = 16; Our Git-RSCLIP is based on the [google/siglip-large-patch16-256]

lcybuaa

Owner 7 days ago

•

edited 7 days ago

Thanks for this model! I assume it is using a Vision Transformer? If so what is the visual patch size of patches in the ViT (16x16, 32x32, etc)? In my own experiments I’ve found the ViT patch size to have a large role in how well small objects are retained in CLIP/SigLIP embeddings.

patch size = 16; Our Git-RSCLIP is based on the [google/siglip-large-patch16-256]

lcybuaa changed discussion status to closed 7 days ago

lcybuaa changed discussion status to open 7 days ago

bradneuberg

7 days ago

Thanks!

I'm very interested in using your model to get embeddings for a given remote sensing image. I've previously been using RemoteCLIP which was trained on 100k image/text pairs, but your 10 million image/text pairs is really compelling.

Is there an embedding dimension size for your SigLIP encoder? Do you have any boilerplate on getting an embedding for a given input image?

lcybuaa

Owner 6 days ago

Thanks!

I'm very interested in using your model to get embeddings for a given remote sensing image. I've previously been using RemoteCLIP which was trained on 100k image/text pairs, but your 10 million image/text pairs is really compelling.

Is there an embedding dimension size for your SigLIP encoder? Do you have any boilerplate on getting an embedding for a given input image?

You can see our updated model-card: use-git-rsclip-to-get-image-features

bradneuberg

6 days ago

Thanks! Do you know the dimensionality of the resulting embeddings?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment