Papers
arxiv:2506.19753

Arabic Dialect Classification using RNNs, Transformers, and Large Language Models: A Comparative Analysis

Published on Jun 24
Authors:
,
,
,

Abstract

State-of-the-art models, including MARBERTv2, achieve high accuracy in classifying Arabic dialects from tweets, enhancing applications like personalized chatbots and social media monitoring.

AI-generated summary

The Arabic language is among the most popular languages in the world with a huge variety of dialects spoken in 22 countries. In this study, we address the problem of classifying 18 Arabic dialects of the QADI dataset of Arabic tweets. RNN models, Transformer models, and large language models (LLMs) via prompt engineering are created and tested. Among these, MARBERTv2 performed best with 65% accuracy and 64% F1-score. Through the use of state-of-the-art preprocessing techniques and the latest NLP models, this paper identifies the most significant linguistic issues in Arabic dialect identification. The results corroborate applications like personalized chatbots that respond in users' dialects, social media monitoring, and greater accessibility for Arabic communities.

Community

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.19753 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.19753 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.