mderakhshani
/

TULIP

Model card Files Files and versions Community

nielsr HF Staff commited on Mar 31

Commit

b24ab89

verified ·

1 Parent(s): fb95596

Add/improve model card for TULIP

Browse files

This PR adds/improves the model card for TULIP, ensuring the correct pipeline tag (`image-to-text`) is set and the license and library name are correctly specified, which aids discoverability. The model card now includes essential information about the model, such as a brief overview, highlights, and links to the paper and code.

Files changed (1) hide show

README.md +25 -3

README.md CHANGED Viewed

@@ -1,3 +1,25 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+pipeline_tag: image-to-text
+library_name: transformers
+---
+# 🌷 TULIP: Token-length Upgraded CLIP
+[TULIP](https://arxiv.org/pdf/2410.10034) (Token-length Upgraded CLIP) addresses the challenge of representing long captions in vision-language models.  It enhances CLIP-like models by incorporating relative position encodings, enabling effective processing of captions longer than the default 77 tokens.
+> *"TULIP: Token-length Upgraded CLIP" (accepted to ICLR 2025)*
+> *[Ivona Najdenkoska](https://ivonajdenkoska.github.io/)٭, [Mohammad M. Derakshani](https://mmderakhshani.github.io/)٭, [Yuki M. Asano](https://yukimasano.github.io/), [Nanne van Noord](https://nanne.github.io/), [Marcel Worring](https://staff.fnwi.uva.nl/m.worring/), [Cees G. M. Snoek](https://www.ceessnoek.info/)*
+> *٭ Equal core contributions*
+Code: https://github.com/ivonajdenkoska/tulip
+## Highlights
+- Improves performance on long caption understanding tasks.
+- Uses relative positional encodings to handle long image captions.
+- Works with CLIP-like models.
+## How to use
+Please refer to the original repository for detailed instructions on how to use and train the model.