arxiv:2312.01629

CLAMP: Contrastive LAnguage Model Prompt-tuning

Published on Dec 4, 2023

Authors:

Abstract

Large language models (LLMs) have emerged as powerful general-purpose interfaces for many machine learning problems. Recent work has adapted LLMs to <PRE_TAG>generative visual tasks</POST_TAG> like <PRE_TAG>image captioning</POST_TAG>, <PRE_TAG>visual question answering</POST_TAG>, and <PRE_TAG>visual chat</POST_TAG>, using a relatively small amount of <PRE_TAG>instruction-tuning data</POST_TAG>. In this paper, we explore whether modern LLMs can also be adapted to classifying an image into a set of categories. First, we evaluate <PRE_TAG>multimodal LLMs</POST_TAG> that are tuned for generative tasks on <PRE_TAG>zero-shot image classification</POST_TAG> and find that their performance is far below that of specialized models like <PRE_TAG>CLIP</POST_TAG>. We then propose an approach for <PRE_TAG>light fine-tuning</POST_TAG> of LLMs using the same contrastive image-caption matching objective as <PRE_TAG>CLIP</POST_TAG>. Our results show that LLMs can, indeed, achieve good image classification performance when adapted this way. Our approach beats <PRE_TAG>state-of-the-art mLLMs</POST_TAG> by 13% and slightly outperforms <PRE_TAG>contrastive learning</POST_TAG> with a custom text model, while also retaining the LLM's generative abilities. <PRE_TAG>LLM initialization</POST_TAG> appears to particularly help classification in domains <PRE_TAG>under-represented in the visual pre-training data</POST_TAG>.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2312.01629 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2312.01629 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2312.01629 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.