Abdu-GH
/

AraRest-Arabic-Restaurant-Reviews-Sentiment-Analysis

Text Classification

sentiment-analysis

Inference Endpoints

Model card Files Files and versions Community

AraRest-Arabic-Restaurant-Reviews-Sentiment-Analysis / README.md

Abdulrahman Al-Ghamdi

Update README.md

059fab4 verified 15 days ago

|

2.4 kB

	---
	license: apache-2.0
	datasets:
	- hadyelsahar/ar_res_reviews
	language:
	- ar
	metrics:
	- accuracy
	- precision
	- recall
	- f1
	base_model:
	- aubmindlab/bert-base-arabertv02
	pipeline_tag: text-classification
	---

	# 🍽️ Arabic Restaurant Review Sentiment Analysis 🚀
	## 📌 Overview
	This project fine-tunes a transformer-based model to analyze sentiment in Arabic restaurant reviews.
	We utilized Hugging Face’s model training pipeline and deployed the final model as an interactive Gradio web app.

	## 📥 Data Collection
	The dataset used for fine-tuning was sourced from Hugging Face Datasets, specifically:
	[📂 Arabic Restaurant Reviews Dataset](https://huggingface.co/datasets/hadyelsahar/ar_res_reviews)
	It contains restaurant reviews in Arabic labeled with sentiment polarity.

	## 🔄 Data Preparation
	- Cleaning & Normalization:
	- Removed non-Arabic text, special characters, and extra spaces.
	- Normalized Arabic characters (e.g., `إ, أ, آ → ا`, `ة → ه`).
	- Downsampled positive reviews to balance the dataset.
	- Tokenization:
	- Used AraBERT tokenizer for efficient text processing.
	- Train-Test Split:
	- 80% Training \| 20% Testing.

	## 🏋️ Fine-Tuning & Results
	The model was fine-tuned using Hugging Face Transformers on a dataset of restaurant reviews.

	### 📊 Evaluation Metrics
	\| Metric \| Score \|
	\|-------------\|--------\|
	\| Train Loss\| `0.470`\|
	\| Eval Loss \| `0.373` \|
	\| Accuracy \| `86.41%` \|
	\| Precision \| `87.01%` \|
	\| Recall \| `86.49%` \|
	\| F1-score \| `86.75%` \|

	## ⚙️ Training Parameters
	```python
	model_name = "aubmindlab/bert-base-arabertv2"
	model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2, classifier_dropout=0.5).to(device)

	training_args = TrainingArguments(
	output_dir="./results",
	evaluation_strategy="epoch",
	save_strategy="epoch",
	per_device_train_batch_size=8,
	per_device_eval_batch_size=8,
	num_train_epochs=4,
	weight_decay=1,
	learning_rate=1e-5,
	lr_scheduler_type="cosine",
	warmup_ratio=0.1,
	fp16=True,
	report_to="none",
	save_total_limit=2,
	gradient_accumulation_steps=2,
	load_best_model_at_end=True,
	max_grad_norm=1.0,
	metric_for_best_model="eval_loss",
	greater_is_better=False,
	)