Fine-Tuned 'Small' LLMs (Still) Significantly Outperform Zero-Shot Generative AI Models in Text Classification
Abstract
Generative AI offers a simple, <PRE_TAG>prompt-based</POST_TAG> alternative to <PRE_TAG>fine-tuning</POST_TAG> smaller <PRE_TAG>BERT-style LLMs</POST_TAG> for text classification tasks. This promises to eliminate the need for manually labeled training data and task-specific model training. However, it remains an open question whether tools like ChatGPT can deliver on this promise. In this paper, we show that smaller, fine-tuned LLMs (still) consistently and significantly outperform larger, zero-shot prompted models in text classification. We compare three major <PRE_TAG>generative AI</POST_TAG> models (ChatGPT with GPT-3.5/GPT-4 and Claude Opus) with several fine-tuned LLMs across a diverse set of classification tasks (sentiment, approval/disapproval, emotions, party positions) and text categories (news, tweets, speeches). We find that <PRE_TAG>fine-tuning</POST_TAG> with application-specific training data achieves superior performance in all cases. To make this approach more accessible to a broader audience, we provide an easy-to-use toolkit alongside this paper. Our toolkit, accompanied by non-technical step-by-step guidance, enables users to select and fine-tune BERT-like LLMs for any classification task with minimal technical and computational effort.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper