File size: 5,630 Bytes
ef9c872 729044d ef9c872 729044d ef9c872 729044d ef9c872 729044d 167c12a 729044d 9e6356e 729044d 9e6356e 729044d 9e6356e 729044d 167c12a 729044d 167c12a 729044d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 |
---
base_model: distilbert/distilbert-base-multilingual-cased
language:
- en
- zh
- es
- hi
- ar
- bn
- pt
- ru
- ja
- de
- ms
- te
- vi
- ko
- fr
- tr
- it
- pl
- uk
- tl
- nl
- gsw
license: apache-2.0
pipeline_tag: text-classification
tags:
- text-classification
- sentiment-analysis
- sentiment
- synthetic data
- multi-class
- social-media-analysis
- customer-feedback
- product-reviews
- brand-monitoring
widget:
- text: >-
I absolutely loved this movie! The acting was superb and the plot was
engaging.
example_title: Very Positive Review
- text: The service at this restaurant was terrible. I'll never go back.
example_title: Very Negative Review
- text: The product works as expected. Nothing special, but it gets the job done.
example_title: Neutral Review
- text: I'm somewhat disappointed with my purchase. It's not as good as I hoped.
example_title: Negative Review
- text: This book changed my life! I couldn't put it down and learned so much.
example_title: Very Positive Review
inference:
parameters:
temperature: 1
---
# 🚀 distilbert-based Multilingual Sentiment Classification Model
TRY IT HERE: `coming soon`
# NEWS!
- 2024/12: We are excited to introduce a multilingual sentiment model! Now you can analyze sentiment across multiple languages, enhancing your global reach.
## Model Details
- `Model Name:` tabularisai/multilingual-sentiment-analysis
- `Base Model:` distilbert/distilbert-base-multilingual-cased
- `Task:` Text Classification (Sentiment Analysis)
- `Languages:` Supports English plus Chinese (中文), Spanish (Español), Hindi (हिन्दी), Arabic (العربية), Bengali (বাংলা), Portuguese (Português), Russian (Русский), Japanese (日本語), German (Deutsch), Malay (Bahasa Melayu), Telugu (తెలుగు), Vietnamese (Tiếng Việt), Korean (한국어), French (Français), Turkish (Türkçe), Italian (Italiano), Polish (Polski), Ukrainian (Українська), Tagalog, Dutch (Nederlands), Swiss German (Schweizerdeutsch).
- `Number of Classes:` 5 (*Very Negative, Negative, Neutral, Positive, Very Positive*)
- `Usage:`
- Social media analysis
- Customer feedback analysis
- Product reviews classification
- Brand monitoring
- Market research
- Customer service optimization
- Competitive intelligence
## Model Description
This model is a fine-tuned version of `distilbert/distilbert-base-multilingual-cased` for multilingual sentiment analysis. It leverages synthetic data from multiple sources to achieve robust performance across different languages and cultural contexts.
### Training Data
Trained exclusively on synthetic multilingual data generated by advanced LLMs, ensuring wide coverage of sentiment expressions from various languages.
### Training Procedure
- Fine-tuned for 5 epochs.
- Achieved a train_acc_off_by_one of approximately 0.93 on the validation dataset.
## Intended Use
Ideal for:
- Multilingual social media monitoring
- International customer feedback analysis
- Global product review sentiment classification
- Worldwide brand sentiment tracking
## How to Use
Below is a Python example on how to use the multilingual sentiment model:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "tabularisai/multilingual-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
def predict_sentiment(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(probabilities, dim=-1).item()
sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"}
return sentiment_map[predicted_class]
texts = [
# English
"I absolutely loved this movie! The acting was superb and the plot was engaging.",
# Chinese
"我讨厌这种无休止的争吵。",
# Spanish
"El producto funciona como se espera. Nada especial, pero cumple con su función.",
# Arabic
"لم أحب هذا الفيلم على الإطلاق. القصة كانت مملة والشخصيات ضعيفة.",
# Ukrainian
"Я розчарований покупкою, вона не така гарна, як я очікував.",
# Hindi
"यह उत्पाद वास्तव में अद्भुत है! इसका उपयोग करना आसान है और यह मेरे लिए बहुत मददगार रहा।",
# Bengali
"আমি এই রেস্তোরাঁর খাবার পছন্দ করিনি। এটি খুব তেলতেলে এবং অতিরিক্ত রান্না করা।",
# Portuguese
"Este livro é fantástico! Eu aprendi muitas coisas novas e inspiradoras."
]
for text in texts:
sentiment = predict_sentiment(text)
print(f"Text: {text}")
print(f"Sentiment: {sentiment}\n")
```
## Training Procedure
- Dataset: Synthetic multilingual data
- Framework: PyTorch Lightning
- Number of epochs: 5
- Validation Off-by-one Accuracy: ~0.95
## Ethical Considerations
Synthetic data reduces bias, but validation in real-world scenarios is advised.
## Citation
```
Will be included.
```
## Contact
For inquiries, private APIs, better models, contact [email protected]
tabularis.ai |