--- license: mit language: - fa metrics: - f1 - accuracy base_model: - HooshvareLab/bert-fa-base-uncased pipeline_tag: text-classification --- # Fine-tuned BERT for Persian Comment Discrepancy Classification This project fine-tunes a BERT model to classify Persian comments into two categories: complaints about Product discrepancy (`True`) and not (`False`). The model is trained on the [Basalam Comments](https://www.kaggle.com/datasets/alirezaazizkhani/labeled-persian-comments) dataset. ## 🛠 Training Details - **Base Model**: `HooshvareLab/bert-fa-base-uncased` - **Fine-Tuning Dataset**: Basalam comments - **[NoteBook](https://www.kaggle.com/code/alirezaazizkhani/finetune-bert-for-discrepancy)** - **Evaluation Metrics**: - **Accuracy**: 95.89% - **F1 Score**: 95.62% ## 📥 How to Use You can load and use the fine-tuned model as follows: ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch def classify_comment(text): model_name = "alireza-2003/bert-fa-discrepancy-detection" model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) with torch.no_grad(): outputs = model(**inputs) prediction = torch.argmax(outputs.logits).item() return "Discrepancy Complaint" if prediction == 1 else "Not a Complaint" comment = "دو تا سفارش داده بودم یدونه ابی و یدونه قرمز ولی هردوتاش قرمز بود" print(classify_comment(comment)) ``` --- 📝 **Author**: [Alireza] 📅 **Last Updated**: [2/16/2025] 🔗 **Dataset**: [Kaggle Dataset](https://www.kaggle.com/datasets/alirezaazizkhani/labeled-persian-comments)