File size: 2,397 Bytes
bd11536 1c3de2e e06c224 059fab4 55166e2 e06c224 55166e2 e06c224 55166e2 e06c224 55166e2 e06c224 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
---
license: apache-2.0
datasets:
- hadyelsahar/ar_res_reviews
language:
- ar
metrics:
- accuracy
- precision
- recall
- f1
base_model:
- aubmindlab/bert-base-arabertv02
pipeline_tag: text-classification
---
# ๐ฝ๏ธ Arabic Restaurant Review Sentiment Analysis ๐
## ๐ Overview
This project fine-tunes a **transformer-based model** to analyze sentiment in **Arabic restaurant reviews**.
We utilized **Hugging Faceโs model training pipeline** and deployed the final model as an **interactive Gradio web app**.
## ๐ฅ Data Collection
The dataset used for fine-tuning was sourced from **Hugging Face Datasets**, specifically:
[๐ Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
It contains **restaurant reviews in Arabic** labeled with sentiment polarity.
## ๐ Data Preparation
- **Cleaning & Normalization**:
- Removed non-Arabic text, special characters, and extra spaces.
- Normalized Arabic characters (e.g., `ุฅ, ุฃ, ุข โ ุง`, `ุฉ โ ู`).
- Downsampled positive reviews to balance the dataset.
- **Tokenization**:
- Used **AraBERT tokenizer** for efficient text processing.
- **Train-Test Split**:
- **80% Training** | **20% Testing**.
## ๐๏ธ Fine-Tuning & Results
The model was fine-tuned using **Hugging Face Transformers** on a dataset of restaurant reviews.
### **๐ Evaluation Metrics**
| Metric | Score |
|-------------|--------|
| **Train Loss**| `0.470`|
| **Eval Loss** | `0.373` |
| **Accuracy** | `86.41%` |
| **Precision** | `87.01%` |
| **Recall** | `86.49%` |
| **F1-score** | `86.75%` |
## โ๏ธ Training Parameters
```python
model_name = "aubmindlab/bert-base-arabertv2"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device)
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
save_strategy="epoch",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=4,
weight_decay=1,
learning_rate=1e-5,
lr_scheduler_type="cosine",
warmup_ratio=0.1,
fp16=True,
report_to="none",
save_total_limit=2,
gradient_accumulation_steps=2,
load_best_model_at_end=True,
max_grad_norm=1.0,
metric_for_best_model="eval_loss",
greater_is_better=False,
)
|