File size: 1,134 Bytes
db90447 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
---
license: apache-2.0
tags:
- vision
---
# SigLIP 2 Base
[SigLIP 2](https://huggingface.co/collections/google/siglip2-67b5dcef38c175486e240107)
extends the pretraining objective of
[SigLIP](https://huggingface.co/collections/google/siglip-659d5e62f0ae1a57ae0e83ba)
with prior, independently developed techniques into a unified recipe, for improved semantic
understanding, localization, and dense features.
## Intended uses
You can use the raw model for tasks like zero-shot image classification and
image-text retrieval, or as a vision encoder for VLMs (and other vision tasks).
## Training procedure
SigLIP 2 adds some clever training objectives on top of SigLIP:
1. Decoder loss
2. Global-local and masked prediction loss
3. Aspect ratio and resolution adaptibility
### Training data
SigLIP 2 is pre-trained on the WebLI dataset [(Chen et al., 2023)](https://arxiv.org/abs/2209.06794).
### Compute
The model was trained on up to 2048 TPU-v5e chips.
## Evaluation results
Evaluation of SigLIP 2 is shown below (taken from the paper).
[Evaluation Table](TODO)
### BibTeX entry and citation info
```bibtex
TODO
```
|