metadata
license: mit
tags:
- vision
- vision-language-model
- contrastive learning
FLAIR Model
Authors: Rui Xiao, Sanghwan Kim, Mariana-Iuliana Georgescu, Zeynep Akata, Stephan Alaniz
FLAIR was introduced in the paper FLAIR: VLM with Fine-grained Language-informed Image Representations. Based on ViT-B-16 Model from OpenCLIP, FLAIR features text-conditioned attention pooling at the end of its vision transformer. Pre-trained on MLLM-recaptioned datasets from DreamLIP, FALIR achieves strong performance in tasks such as zero-shot image-text retrieval and zero-shot segmentation.