|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- hadyelsahar/ar_res_reviews |
|
language: |
|
- ar |
|
metrics: |
|
- accuracy |
|
- precision |
|
- recall |
|
- f1 |
|
base_model: |
|
- aubmindlab/bert-base-arabertv02 |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# ๐ฝ๏ธ Arabic Restaurant Review Sentiment Analysis ๐ |
|
## ๐ Overview |
|
This project fine-tunes a **transformer-based model** to analyze sentiment in **Arabic restaurant reviews**. |
|
We utilized **Hugging Faceโs model training pipeline** and deployed the final model as an **interactive Gradio web app**. |
|
|
|
## ๐ฅ Data Collection |
|
The dataset used for fine-tuning was sourced from **Hugging Face Datasets**, specifically: |
|
[๐ Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews) |
|
It contains **restaurant reviews in Arabic** labeled with sentiment polarity. |
|
|
|
## ๐ Data Preparation |
|
- **Cleaning & Normalization**: |
|
- Removed non-Arabic text, special characters, and extra spaces. |
|
- Normalized Arabic characters (e.g., `ุฅ, ุฃ, ุข โ ุง`, `ุฉ โ ู`). |
|
- Downsampled positive reviews to balance the dataset. |
|
- **Tokenization**: |
|
- Used **AraBERT tokenizer** for efficient text processing. |
|
- **Train-Test Split**: |
|
- **80% Training** | **20% Testing**. |
|
|
|
## ๐๏ธ Fine-Tuning & Results |
|
The model was fine-tuned using **Hugging Face Transformers** on a dataset of restaurant reviews. |
|
|
|
### **๐ Evaluation Metrics** |
|
| Metric | Score | |
|
|-------------|--------| |
|
| **Train Loss**| `0.470`| |
|
| **Eval Loss** | `0.373` | |
|
| **Accuracy** | `86.41%` | |
|
| **Precision** | `87.01%` | |
|
| **Recall** | `86.49%` | |
|
| **F1-score** | `86.75%` | |
|
|
|
## โ๏ธ Training Parameters |
|
```python |
|
model_name = "aubmindlab/bert-base-arabertv2" |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device) |
|
|
|
training_args = TrainingArguments( |
|
output_dir="./results", |
|
evaluation_strategy="epoch", |
|
save_strategy="epoch", |
|
per_device_train_batch_size=8, |
|
per_device_eval_batch_size=8, |
|
num_train_epochs=4, |
|
weight_decay=1, |
|
learning_rate=1e-5, |
|
lr_scheduler_type="cosine", |
|
warmup_ratio=0.1, |
|
fp16=True, |
|
report_to="none", |
|
save_total_limit=2, |
|
gradient_accumulation_steps=2, |
|
load_best_model_at_end=True, |
|
max_grad_norm=1.0, |
|
metric_for_best_model="eval_loss", |
|
greater_is_better=False, |
|
) |
|
|