EGD XLM-RoBERTa (Multilingual)

Model Overview

This model is based on XLM-RoBERTa-large and has been fine-tuned on English, Hungarian, and German data for text classification of European Parliamentary speeches into rhetorical categories.

The model classifies text into three categories:

0 - Other (text that does not fit into moralist or realist categories)
1 - Moralist (arguments emphasizing moral reasoning)
2 - Realist (arguments applying pragmatic or realist reasoning)

This model is useful for analyzing political discourse and rhetorical styles in multiple languages.

Evaluation Results

The model was evaluated on a test set of 938 sentences, with the following results:

Label	Precision	Recall	F1-score	Support
0 - Other	0.98	0.95	0.96	783
1 - Moralist	0.73	0.82	0.77	65
2 - Realist	0.79	0.90	0.84	90

Overall accuracy: 0.93
Macro average F1-score: 0.86
Weighted average F1-score: 0.94

The model reliably distinguishes rhetorical styles, with strong performance across all categories.

Confusion Matrix

Below is the confusion matrix for the model's performance on the test set:

Usage

This model can be used with the Hugging Face Transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "uvegesistvan/EDG-xlm-roberta-large"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Classify an example text
text = "The European Union has a responsibility towards future generations."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits

# Get predicted class
predicted_class = logits.argmax().item()
print(f"Predicted class: {predicted_class}")