Model Card for uvegesistvan/wildmann_german_proposal_2b_GER_ENG_SLO
Model Overview
This model is a multi-class emotion classifier trained on German-to-English-to-Slovak machine-translated text data. It identifies nine distinct emotional states in text. The dataset combines synthetic and original German sentences translated sequentially into English and Slovak, presenting unique challenges and opportunities for cross-linguistic emotion classification.
Emotion Classes
The model classifies the following emotional states:
- Anger (0)
- Fear (1)
- Disgust (2)
- Sadness (3)
- Joy (4)
- Enthusiasm (5)
- Hope (6)
- Pride (7)
- No emotion (8)
Dataset and Preprocessing
The dataset consists of German text first translated into English and then into Slovak. This sequential translation introduces additional linguistic complexity and potential noise. Preprocessing steps included:
- Normalization to reduce noise introduced during translations.
- Undersampling of overrepresented classes, such as "No emotion" and "Anger," to balance the dataset.
Evaluation Metrics
The model's performance was evaluated using precision, recall, F1-score, and accuracy metrics. Detailed results are as follows:
Class | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
Anger (0) | 0.34 | 0.41 | 0.37 | 777 |
Fear (1) | 0.86 | 0.67 | 0.75 | 776 |
Disgust (2) | 0.95 | 0.92 | 0.93 | 776 |
Sadness (3) | 0.86 | 0.78 | 0.82 | 775 |
Joy (4) | 0.84 | 0.73 | 0.78 | 777 |
Enthusiasm (5) | 0.57 | 0.46 | 0.51 | 776 |
Hope (6) | 0.32 | 0.41 | 0.36 | 777 |
Pride (7) | 0.84 | 0.60 | 0.70 | 776 |
No emotion (8) | 0.48 | 0.59 | 0.53 | 1553 |
Overall Metrics
- Accuracy: 0.61
- Macro Average: Precision = 0.67, Recall = 0.62, F1-Score = 0.64
- Weighted Average: Precision = 0.65, Recall = 0.61, F1-Score = 0.63
Performance Insights
The model shows strong performance in detecting "Disgust" and "Fear," but struggles with "Anger," "Hope," and "No emotion," likely due to the compounded translation noise and subtle emotional cues being lost in the translation process. These results highlight the challenges of training models on sequentially translated text.
Model Usage
Applications
- Emotion analysis of German texts translated sequentially into English and Slovak for sentiment tracking or research.
- Studying cross-linguistic emotion classification in complex multilingual contexts.
- Sentiment analysis for Slovak content derived from German source material through intermediate English translations.
Limitations
- Sequential translation increases the likelihood of noise and inaccuracies, affecting classification performance for subtle emotional states.
- The model's accuracy is lower compared to models trained on single-step translations, reflecting the challenges introduced by additional linguistic transformations.
Ethical Considerations
The use of sequentially machine-translated datasets may result in biases or inaccuracies due to compounded linguistic and cultural nuances being lost in translation. Users should carefully evaluate the model for their specific use case, particularly in sensitive applications such as mental health or social studies.
Citation
For further information, visit: uvegesistvan/wildmann_german_proposal_2b_GER_ENG_SLO
- Downloads last month
- 4