Model Card for uvegesistvan/wildmann_german_proposal_2b_pooled_english

Model Overview

This model is a multi-class emotion classifier trained on German text translated into English as an intermediate step, followed by translations into Czech, Polish, Slovak, and Hungarian. The model identifies nine distinct emotional states in text, leveraging a pooled dataset designed to capture multilingual and cross-linguistic variations in emotion expression.

Emotion Classes

The model classifies the following emotional states:

  • Anger (0)
  • Fear (1)
  • Disgust (2)
  • Sadness (3)
  • Joy (4)
  • Enthusiasm (5)
  • Hope (6)
  • Pride (7)
  • No emotion (8)

Dataset and Preprocessing

The dataset consists of German text first translated into English, then subsequently into Czech, Polish, Slovak, and Hungarian. Preprocessing steps included:

  • Normalization to mitigate noise introduced during sequential translations.
  • Balancing of the dataset through undersampling of overrepresented classes such as "No emotion" and "Anger."

Evaluation Metrics

The model's performance was evaluated using precision, recall, F1-score, and accuracy metrics. Detailed results are as follows:

Class Precision Recall F1-Score Support
Anger (0) 0.57 0.50 0.53 3108
Fear (1) 0.82 0.76 0.79 3104
Disgust (2) 0.94 0.94 0.94 3104
Sadness (3) 0.84 0.85 0.85 3100
Joy (4) 0.72 0.86 0.78 3108
Enthusiasm (5) 0.67 0.57 0.61 3104
Hope (6) 0.48 0.55 0.51 3108
Pride (7) 0.74 0.78 0.76 3104
No emotion (8) 0.65 0.63 0.64 6212

Overall Metrics

  • Accuracy: 0.71
  • Macro Average: Precision = 0.71, Recall = 0.71, F1-Score = 0.71
  • Weighted Average: Precision = 0.71, Recall = 0.71, F1-Score = 0.70

Performance Insights

The model performs strongly on classes such as "Disgust," "Fear," and "Sadness," but struggles with "Anger," "Hope," and "Enthusiasm," likely due to translation noise and the complexity of subtle emotional states across multiple linguistic transformations. The intermediate English step adds consistency to the translations but also introduces its own challenges in emotion classification.

Model Usage

Applications

  • Emotion analysis of texts originating in German and translated into English and subsequently into Czech, Polish, Slovak, or Hungarian.
  • Sentiment tracking and research in complex multilingual contexts.
  • Cross-linguistic studies of emotion expression across multiple languages.

Limitations

  • Sequential translations introduce cumulative noise and may obscure subtle emotional states.
  • Performance may vary across different languages due to differences in linguistic structures and cultural expressions of emotion.

Ethical Considerations

The reliance on sequential translations may amplify biases or inaccuracies from the machine translation systems. Users should validate the model for their specific use cases, especially in sensitive domains such as mental health or cultural studies.

Citation

For further information, visit: uvegesistvan/wildmann_german_proposal_2b_pooled_english

Downloads last month
3
Safetensors
Model size
560M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.