Add/improve model card for TULIP
Browse filesThis PR adds/improves the model card for TULIP, ensuring the correct pipeline tag (`image-to-text`) is set and the license and library name are correctly specified, which aids discoverability. The model card now includes essential information about the model, such as a brief overview, highlights, and links to the paper and code.
README.md
CHANGED
@@ -1,3 +1,25 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
pipeline_tag: image-to-text
|
4 |
+
library_name: transformers
|
5 |
+
---
|
6 |
+
|
7 |
+
# 🌷 TULIP: Token-length Upgraded CLIP
|
8 |
+
|
9 |
+
[TULIP](https://arxiv.org/pdf/2410.10034) (Token-length Upgraded CLIP) addresses the challenge of representing long captions in vision-language models. It enhances CLIP-like models by incorporating relative position encodings, enabling effective processing of captions longer than the default 77 tokens.
|
10 |
+
|
11 |
+
> *"TULIP: Token-length Upgraded CLIP" (accepted to ICLR 2025)*
|
12 |
+
> *[Ivona Najdenkoska](https://ivonajdenkoska.github.io/)٭, [Mohammad M. Derakshani](https://mmderakhshani.github.io/)٭, [Yuki M. Asano](https://yukimasano.github.io/), [Nanne van Noord](https://nanne.github.io/), [Marcel Worring](https://staff.fnwi.uva.nl/m.worring/), [Cees G. M. Snoek](https://www.ceessnoek.info/)*
|
13 |
+
> *٭ Equal core contributions*
|
14 |
+
|
15 |
+
Code: https://github.com/ivonajdenkoska/tulip
|
16 |
+
|
17 |
+
## Highlights
|
18 |
+
- Improves performance on long caption understanding tasks.
|
19 |
+
- Uses relative positional encodings to handle long image captions.
|
20 |
+
- Works with CLIP-like models.
|
21 |
+
|
22 |
+
|
23 |
+
## How to use
|
24 |
+
|
25 |
+
Please refer to the original repository for detailed instructions on how to use and train the model.
|