Zero-Shot Image Classification
vision
ariG23498's picture
ariG23498 HF staff
Upload README.md with huggingface_hub (#1)
2d9c4d3 verified
|
raw
history blame
1.13 kB
metadata
license: apache-2.0
tags:
  - vision

SigLIP 2 Base

SigLIP 2 extends the pretraining objective of SigLIP with prior, independently developed techniques into a unified recipe, for improved semantic understanding, localization, and dense features.

Intended uses

You can use the raw model for tasks like zero-shot image classification and image-text retrieval, or as a vision encoder for VLMs (and other vision tasks).

Training procedure

SigLIP 2 adds some clever training objectives on top of SigLIP:

  1. Decoder loss
  2. Global-local and masked prediction loss
  3. Aspect ratio and resolution adaptibility

Training data

SigLIP 2 is pre-trained on the WebLI dataset (Chen et al., 2023).

Compute

The model was trained on up to 2048 TPU-v5e chips.

Evaluation results

Evaluation of SigLIP 2 is shown below (taken from the paper).

Evaluation Table

BibTeX entry and citation info

TODO