README.md · xiaorui638/flair at 50f4489f9e5a7f321f8302c4c9cd5ab7838d3a1e

metadata

license: mit
tags:
  - vision
  - vision-language-model
  - contrastive learning

FLAIR Model

Authors: Rui Xiao, Sanghwan Kim, Mariana-Iuliana Georgescu, Zeynep Akata, Stephan Alaniz

FLAIR was introduced in the paper FLAIR: VLM with Fine-grained Language-informed Image Representations. Based on ViT-B-16 Model from OpenCLIP, FLAIR features text-conditioned attention pooling at the end of its vision transformer. Pre-trained on MLLM-recaptioned datasets from DreamLIP, FALIR achieves strong performance in tasks such as zero-shot image-text retrieval and zero-shot segmentation.