File size: 2,397 Bytes
bd11536
 
 
 
 
 
 
 
 
 
 
 
 
 
1c3de2e
e06c224
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
059fab4
55166e2
 
 
 
 
e06c224
 
 
55166e2
 
 
e06c224
 
55166e2
 
 
 
 
 
 
 
 
e06c224
55166e2
 
 
 
 
 
 
e06c224
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
license: apache-2.0
datasets:
- hadyelsahar/ar_res_reviews
language:
- ar
metrics:
- accuracy
- precision
- recall
- f1
base_model:
- aubmindlab/bert-base-arabertv02
pipeline_tag: text-classification
---

# ๐Ÿฝ๏ธ Arabic Restaurant Review Sentiment Analysis ๐Ÿš€
## ๐Ÿ“Œ Overview  
This project fine-tunes a **transformer-based model** to analyze sentiment in **Arabic restaurant reviews**.  
We utilized **Hugging Faceโ€™s model training pipeline** and deployed the final model as an **interactive Gradio web app**.

## ๐Ÿ“ฅ Data Collection  
The dataset used for fine-tuning was sourced from **Hugging Face Datasets**, specifically:  
[๐Ÿ“‚ Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)  
It contains **restaurant reviews in Arabic** labeled with sentiment polarity.

## ๐Ÿ”„ Data Preparation  
- **Cleaning & Normalization**:
  - Removed non-Arabic text, special characters, and extra spaces.
  - Normalized Arabic characters (e.g., `ุฅ, ุฃ, ุข โ†’ ุง`, `ุฉ โ†’ ู‡`).
  - Downsampled positive reviews to balance the dataset.
- **Tokenization**:
  - Used **AraBERT tokenizer** for efficient text processing.
- **Train-Test Split**:
  - **80% Training** | **20% Testing**.

## ๐Ÿ‹๏ธ Fine-Tuning & Results  
The model was fine-tuned using **Hugging Face Transformers** on a dataset of restaurant reviews.

### **๐Ÿ“Š Evaluation Metrics**
| Metric       | Score  |
|-------------|--------|
| **Train Loss**| `0.470`|
| **Eval Loss** | `0.373` |
| **Accuracy**  | `86.41%` |
| **Precision** | `87.01%` |
| **Recall**    | `86.49%` |
| **F1-score**  | `86.75%` |

## โš™๏ธ Training Parameters  
```python
model_name = "aubmindlab/bert-base-arabertv2"
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device)

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",       
    save_strategy="epoch",             
    per_device_train_batch_size=8,  
    per_device_eval_batch_size=8,   
    num_train_epochs=4,  
    weight_decay=1,  
    learning_rate=1e-5,  
    lr_scheduler_type="cosine",  
    warmup_ratio=0.1,  
    fp16=True,
    report_to="none",
    save_total_limit=2,
    gradient_accumulation_steps=2,
    load_best_model_at_end=True,
    max_grad_norm=1.0,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
)