File size: 1,134 Bytes

db90447

---
license: apache-2.0
tags:
- vision
---

# SigLIP 2 Base

[SigLIP 2](https://huggingface.co/collections/google/siglip2-67b5dcef38c175486e240107)
extends the pretraining objective of
[SigLIP](https://huggingface.co/collections/google/siglip-659d5e62f0ae1a57ae0e83ba)
with prior, independently developed techniques into a unified recipe, for improved semantic
understanding, localization, and dense features.

## Intended uses

You can use the raw model for tasks like zero-shot image classification and
image-text retrieval, or as a vision encoder for VLMs (and other vision tasks).


## Training procedure

SigLIP 2 adds some clever training objectives on top of SigLIP:

1. Decoder loss
2. Global-local and masked prediction loss
3. Aspect ratio and resolution adaptibility 

### Training data

SigLIP 2 is pre-trained on the WebLI dataset [(Chen et al., 2023)](https://arxiv.org/abs/2209.06794).

### Compute

The model was trained on up to 2048 TPU-v5e chips.

## Evaluation results

Evaluation of SigLIP 2 is shown below (taken from the paper).

[Evaluation Table](TODO)

### BibTeX entry and citation info

```bibtex
TODO
```