EGD XLM-RoBERTa (Multilingual)

Model Overview

This model is based on XLM-RoBERTa-large and has been fine-tuned on English, Hungarian, and German data for text classification of European Parliamentary speeches into rhetorical categories.

The model classifies text into three categories:

  • 0 - Other (text that does not fit into moralist or realist categories)
  • 1 - Moralist (arguments emphasizing moral reasoning)
  • 2 - Realist (arguments applying pragmatic or realist reasoning)

This model is useful for analyzing political discourse and rhetorical styles in multiple languages.


Evaluation Results

The model was evaluated on a test set of 938 sentences, with the following results:

Label Precision Recall F1-score Support
0 - Other 0.98 0.95 0.96 783
1 - Moralist 0.73 0.82 0.77 65
2 - Realist 0.79 0.90 0.84 90
  • Overall accuracy: 0.93
  • Macro average F1-score: 0.86
  • Weighted average F1-score: 0.94

The model reliably distinguishes rhetorical styles, with strong performance across all categories.

Confusion Matrix

Below is the confusion matrix for the model's performance on the test set:

Confusion Matrix


Usage

This model can be used with the Hugging Face Transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "uvegesistvan/EDG-xlm-roberta-large"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Classify an example text
text = "The European Union has a responsibility towards future generations."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits

# Get predicted class
predicted_class = logits.argmax().item()
print(f"Predicted class: {predicted_class}")
Downloads last month
68
Safetensors
Model size
560M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.