metadata
license: apache-2.0
datasets:
- hadyelsahar/ar_res_reviews
language:
- ar
metrics:
- accuracy
- precision
- recall
- f1
base_model:
- aubmindlab/bert-base-arabertv02
pipeline_tag: text-classification
π½οΈ Arabic Restaurant Review Sentiment Analysis π
π Overview
This project fine-tunes a transformer-based model to analyze sentiment in Arabic restaurant reviews.
We utilized Hugging Faceβs model training pipeline and deployed the final model as an interactive Gradio web app.
π₯ Data Collection
The dataset used for fine-tuning was sourced from Hugging Face Datasets, specifically:
π Arabic Restaurant Reviews Dataset
It contains restaurant reviews in Arabic labeled with sentiment polarity.
π Data Preparation
- Cleaning & Normalization:
- Removed non-Arabic text, special characters, and extra spaces.
- Normalized Arabic characters (e.g.,
Ψ₯, Ψ£, Ψ’ β Ψ§
,Ψ© β Ω
). - Downsampled positive reviews to balance the dataset.
- Tokenization:
- Used AraBERT tokenizer for efficient text processing.
- Train-Test Split:
- 80% Training | 20% Testing.
ποΈ Fine-Tuning & Results
The model was fine-tuned using Hugging Face Transformers on a dataset of restaurant reviews.
π Evaluation Metrics
Metric | Score |
---|---|
Train Loss | 0.470 |
Eval Loss | 0.373 |
Accuracy | 86.41% |
Precision | 87.01% |
Recall | 86.49% |
F1-score | 86.75% |
βοΈ Training Parameters
model_name = "aubmindlab/bert-base-arabertv2"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device)
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
save_strategy="epoch",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=4,
weight_decay=1,
learning_rate=1e-5,
lr_scheduler_type="cosine",
warmup_ratio=0.1,
fp16=True,
report_to="none",
save_total_limit=2,
gradient_accumulation_steps=2,
load_best_model_at_end=True,
max_grad_norm=1.0,
metric_for_best_model="eval_loss",
greater_is_better=False,
)