Sentiment Analysis with Fine-tuned Multilingual BERT for Georgian ๐Ÿ‡ฌ๐Ÿ‡ช

๐Ÿ“„ Model Overview

This is a fine-tuned BERT model for Georgian sentiment analysis, based on bert-base-multilingual-cased. The model was trained using the Georgian Sentiment Analysis dataset.

  • Base Model: bert-base-multilingual-cased
  • Fine-tuned on: Arseniy-Sandalov/Georgian-Sentiment-Analysis
  • Task: Sentiment classification (positive, negative, neutral)
  • Tokenizer: BERT multilingual cased tokenizer
  • License: Check dataset source

๐Ÿ‘‰ Usage Example

You can load and use this model with Hugging Face Transformers:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_name = "Arseniy-Sandalov/GeorgianBert-Sent"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    return ["negative", "neutral", "positive"][prediction]

text = "แƒแƒฎแƒแƒšแƒ˜ แƒ›แƒ”แƒแƒ แƒ˜ แƒ™แƒแƒ แƒ’แƒ˜แƒ แƒ”แƒ แƒ—แƒ˜แƒšแƒ"
print(predict_sentiment(text))

๐Ÿ“Š Training Details

Dataset Preprocessing:

  • Removed irrelevant columns (e.g., perturbation)

  • Stratified split: 80% train, 10% validation, 10% test

Evaluation Metric:

  • ROC AUC Score (computed on validation & test sets)

๐Ÿ“– Citation

If you use this model, please cite the original dataset:

@misc {Stefanovitch2023Sentiment,
  author = {Stefanovitch, Nicolas and Piskorski, Jakub and Kharazi, Sopho},
  title = {Sentiment analysis for Georgian},
  year = {2023},
  publisher = {European Commission, Joint Research Centre (JRC)},
  howpublished = {\url{http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf}},
  url = {http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf},
  type = {dataset},
  note = {PID: http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf}
}
Downloads last month
0
Safetensors
Model size
178M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for Arseniy-Sandalov/GeorgianBert-Sent

Finetuned
(634)
this model

Dataset used to train Arseniy-Sandalov/GeorgianBert-Sent