EGD XLM-RoBERTa (Multilingual)
Model Overview
This model is based on XLM-RoBERTa-large and has been fine-tuned on English, Hungarian, and German data for text classification of European Parliamentary speeches into rhetorical categories.
The model classifies text into three categories:
- 0 - Other (text that does not fit into moralist or realist categories)
- 1 - Moralist (arguments emphasizing moral reasoning)
- 2 - Realist (arguments applying pragmatic or realist reasoning)
This model is useful for analyzing political discourse and rhetorical styles in multiple languages.
Evaluation Results
The model was evaluated on a test set of 938 sentences, with the following results:
Label | Precision | Recall | F1-score | Support |
---|---|---|---|---|
0 - Other | 0.98 | 0.95 | 0.96 | 783 |
1 - Moralist | 0.73 | 0.82 | 0.77 | 65 |
2 - Realist | 0.79 | 0.90 | 0.84 | 90 |
- Overall accuracy: 0.93
- Macro average F1-score: 0.86
- Weighted average F1-score: 0.94
The model reliably distinguishes rhetorical styles, with strong performance across all categories.
Confusion Matrix
Below is the confusion matrix for the model's performance on the test set:
Usage
This model can be used with the Hugging Face Transformers library:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "uvegesistvan/EDG-xlm-roberta-large"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Classify an example text
text = "The European Union has a responsibility towards future generations."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# Get predicted class
predicted_class = logits.argmax().item()
print(f"Predicted class: {predicted_class}")
- Downloads last month
- 68
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.