nielsr HF Staff commited on
Commit
b24ab89
·
verified ·
1 Parent(s): fb95596

Add/improve model card for TULIP

Browse files

This PR adds/improves the model card for TULIP, ensuring the correct pipeline tag (`image-to-text`) is set and the license and library name are correctly specified, which aids discoverability. The model card now includes essential information about the model, such as a brief overview, highlights, and links to the paper and code.

Files changed (1) hide show
  1. README.md +25 -3
README.md CHANGED
@@ -1,3 +1,25 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-to-text
4
+ library_name: transformers
5
+ ---
6
+
7
+ # 🌷 TULIP: Token-length Upgraded CLIP
8
+
9
+ [TULIP](https://arxiv.org/pdf/2410.10034) (Token-length Upgraded CLIP) addresses the challenge of representing long captions in vision-language models. It enhances CLIP-like models by incorporating relative position encodings, enabling effective processing of captions longer than the default 77 tokens.
10
+
11
+ > *"TULIP: Token-length Upgraded CLIP" (accepted to ICLR 2025)*
12
+ > *[Ivona Najdenkoska](https://ivonajdenkoska.github.io/)٭, [Mohammad M. Derakshani](https://mmderakhshani.github.io/)٭, [Yuki M. Asano](https://yukimasano.github.io/), [Nanne van Noord](https://nanne.github.io/), [Marcel Worring](https://staff.fnwi.uva.nl/m.worring/), [Cees G. M. Snoek](https://www.ceessnoek.info/)*
13
+ > *٭ Equal core contributions*
14
+
15
+ Code: https://github.com/ivonajdenkoska/tulip
16
+
17
+ ## Highlights
18
+ - Improves performance on long caption understanding tasks.
19
+ - Uses relative positional encodings to handle long image captions.
20
+ - Works with CLIP-like models.
21
+
22
+
23
+ ## How to use
24
+
25
+ Please refer to the original repository for detailed instructions on how to use and train the model.