KONAN

Model Description

KONAN is an Arabic text classification model designed to distinguish between human-written and machine-generated Arabic news articles.
The model aims to support research and applications related to AI-generated content detection, misinformation analysis, and media authenticity in Arabic-speaking contexts.

It is based on the aubmindlab/bert-base-arabertv02 base model and fine-tuned on a curated dataset of Arabic news texts, labeled as either:

  • human
  • machine (AI-written)

The model learns stylistic, syntactic, and semantic patterns that differentiate human journalism from automatically generated text.


Finetuning Procedure

The model was fine-tuned using supervised learning for sequence classification, with PEFT (LoRA) adapters to efficiently adapt the base model while retaining its strong Arabic language understanding.

Key aspects of training include:

  • Coverage of multiple Arabic news domains (politics, economy, sports, technology, society)
  • Exposure to different AI generation styles and prompting strategies
  • Normalization of Arabic text (diacritics removal, punctuation consistency)

Intended Use

This model is intended for:

  • Detecting AI-generated Arabic news articles
  • Assisting journalists, fact-checkers, and researchers
  • Studying stylistic differences between human and machine-written Arabic text

How to Use

Example: Classifying Arabic News Text

from transformers import pipeline

text = """
أعلنت وزارة الاقتصاد اليوم عن إطلاق خطة جديدة تهدف إلى دعم الشركات
الصغيرة والمتوسطة وتعزيز فرص العمل خلال السنوات القادمة.
"""

classifier = pipeline("text-classification", model="salmane11/konan", tokenizer="salmane11/konan", truncation=True, device = 0)

def detect_ai_generated_news(news: str) -> str:
    label = classifier(news)
    if label[0]['label']=="machine":
      return True
    else:
      return False
#detect_ai_generated_news(aljazeera_news['content'][0])

Cite our work

@article{lamsiyah2025m,
  title={M-DAIGT: A Shared Task on Multi-Domain Detection of AI-Generated Text},
  author={Lamsiyah, Salima and Ezzini, Saad and El Mahdaouy, Abdelkader and Alami, Hamza and Benlahbib, Abdessamad and El Amrany, Samir and Chafik, Salmane and Hammouchi, Hicham},
  journal={M-DAIGT-ST 2025},
  pages={1},
  year={2025}
}

Downloads last month
27
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for salmane11/Konan

Finetuned
(4019)
this model

Dataset used to train salmane11/Konan