TIGER-Lab/ABC-Pretraining-Data
Viewer
•
Updated
•
2.25M
•
8
A collection of models and datasets from ABC: Achieving Better Control of Multimodal Embeddings using VLMs.
Note Pretraining data for ABC-Qwen2VL-Pretrain, derived from Conceptual Captions using negative mining for details, see the paper.
Note Instruction finetuning dataset derived from Visual Genome, contains multiple instructions for each image (which can be used as negatives for each other while training).
Note The pretrained base adapter. Supports text and image embeddings (similar to CLIP) for creating embeddings. If training your own adapter, use this as the base.
Note The final instruction finetuned model. Support text, image, and image-text modalities when creating embeddings.