Upload README.md with huggingface_hub (#1)
Browse files- Upload README.md with huggingface_hub (c212b9b05e11fe19435d73d3e0bd4fd5a716e0f6)
README.md
ADDED
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
tags:
|
4 |
+
- vision
|
5 |
+
---
|
6 |
+
|
7 |
+
# SigLIP 2 Large
|
8 |
+
|
9 |
+
[SigLIP 2](https://huggingface.co/collections/google/siglip2-67b5dcef38c175486e240107)
|
10 |
+
extends the pretraining objective of
|
11 |
+
[SigLIP](https://huggingface.co/collections/google/siglip-659d5e62f0ae1a57ae0e83ba)
|
12 |
+
with prior, independently developed techniques into a unified recipe, for improved semantic
|
13 |
+
understanding, localization, and dense features.
|
14 |
+
|
15 |
+
## Intended uses
|
16 |
+
|
17 |
+
You can use the raw model for tasks like zero-shot image classification and
|
18 |
+
image-text retrieval, or as a vision encoder for VLMs (and other vision tasks).
|
19 |
+
|
20 |
+
|
21 |
+
## Training procedure
|
22 |
+
|
23 |
+
SigLIP 2 adds some clever training objectives on top of SigLIP:
|
24 |
+
|
25 |
+
1. Decoder loss
|
26 |
+
2. Global-local and masked prediction loss
|
27 |
+
3. Aspect ratio and resolution adaptibility
|
28 |
+
|
29 |
+
### Training data
|
30 |
+
|
31 |
+
SigLIP 2 is pre-trained on the WebLI dataset [(Chen et al., 2023)](https://arxiv.org/abs/2209.06794).
|
32 |
+
|
33 |
+
### Compute
|
34 |
+
|
35 |
+
The model was trained on up to 2048 TPU-v5e chips.
|
36 |
+
|
37 |
+
## Evaluation results
|
38 |
+
|
39 |
+
Evaluation of SigLIP 2 is shown below (taken from the paper).
|
40 |
+
|
41 |
+
[Evaluation Table](TODO)
|
42 |
+
|
43 |
+
### BibTeX entry and citation info
|
44 |
+
|
45 |
+
```bibtex
|
46 |
+
TODO
|
47 |
+
```
|