flair / README.md
xiaorui638's picture
Update README.md
50f4489 verified
|
raw
history blame
958 Bytes
metadata
license: mit
tags:
  - vision
  - vision-language-model
  - contrastive learning

FLAIR Model

Authors: Rui Xiao, Sanghwan Kim, Mariana-Iuliana Georgescu, Zeynep Akata, Stephan Alaniz

FLAIR was introduced in the paper FLAIR: VLM with Fine-grained Language-informed Image Representations. Based on ViT-B-16 Model from OpenCLIP, FLAIR features text-conditioned attention pooling at the end of its vision transformer. Pre-trained on MLLM-recaptioned datasets from DreamLIP, FALIR achieves strong performance in tasks such as zero-shot image-text retrieval and zero-shot segmentation.