Zero-Shot Image Classification
vision
File size: 1,134 Bytes
db90447
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
license: apache-2.0
tags:
- vision
---

# SigLIP 2 Base

[SigLIP 2](https://huggingface.co/collections/google/siglip2-67b5dcef38c175486e240107)
extends the pretraining objective of
[SigLIP](https://huggingface.co/collections/google/siglip-659d5e62f0ae1a57ae0e83ba)
with prior, independently developed techniques into a unified recipe, for improved semantic
understanding, localization, and dense features.

## Intended uses

You can use the raw model for tasks like zero-shot image classification and
image-text retrieval, or as a vision encoder for VLMs (and other vision tasks).


## Training procedure

SigLIP 2 adds some clever training objectives on top of SigLIP:

1. Decoder loss
2. Global-local and masked prediction loss
3. Aspect ratio and resolution adaptibility 

### Training data

SigLIP 2 is pre-trained on the WebLI dataset [(Chen et al., 2023)](https://arxiv.org/abs/2209.06794).

### Compute

The model was trained on up to 2048 TPU-v5e chips.

## Evaluation results

Evaluation of SigLIP 2 is shown below (taken from the paper).

[Evaluation Table](TODO)

### BibTeX entry and citation info

```bibtex
TODO
```