|
--- |
|
language: |
|
- de |
|
tags: |
|
- distilbert |
|
- german |
|
- classification |
|
datasets: |
|
- germeval21 |
|
widget: |
|
- text: "Das ist ein guter Punkt, so hatte ich das noch nicht betrachtet." |
|
example_title: "Agreement (non-toxic)" |
|
- text: "Wow, was ein geiles Spiel. Glückwunsch." |
|
example_title: "Football (non-toxic)" |
|
- text: "Halt deine scheiß Fresse, du Arschloch" |
|
example_title: "Silence (toxic)" |
|
- text: "Verpiss dich, du dreckiger Hurensohn." |
|
example_title: "Dismiss (toxic)" |
|
--- |
|
|
|
# German Toxic Comment Classification |
|
|
|
## Model Description |
|
|
|
This model was created with the purpose to detect toxic or potentially harmful comments. |
|
|
|
For this model, we fine-tuned a German DistilBERT model [distilbert-base-german-cased](https://huggingface.co/distilbert-base-german-cased) on a combination of five German datasets containing toxicity, profanity, offensive, or hate speech. |
|
|
|
|
|
## Intended Uses & Limitations |
|
|
|
This model can be used to detect toxicity in German comments. |
|
However, the definition of toxicity is vague and the model might not be able to detect all instances of toxicity. |
|
|
|
It will not be able to detect toxicity in languages other than German. |
|
|
|
|
|
## How to Use |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
model_hub_url = 'https://huggingface.co/ml6team/distilbert-base-german-cased-toxic-comments' |
|
model_name = 'ml6team/distilbert-base-german-cased-toxic-comments' |
|
|
|
toxicity_pipeline = pipeline('text-classification', model=model_name, tokenizer=model_name) |
|
|
|
comment = "Ein harmloses Beispiel" |
|
result = toxicity_pipeline(comment)[0] |
|
print(f"Comment: {comment}\nLabel: {result['label']}, score: {result['score']}") |
|
``` |
|
|
|
|
|
## Limitations and Bias |
|
|
|
The model was trained on a combinations of datasets that contain examples gathered from different social networks and internet communities. This only represents a narrow subset of possible instances of toxicity and instances in other domains might not be detected reliably. |
|
|
|
|
|
## Training Data |
|
|
|
The training dataset combines the following five datasets: |
|
|
|
* GermEval18 [[dataset](https://github.com/uds-lsv/GermEval-2018-Data)] |
|
* Labels: abuse, profanity, toxicity |
|
* GermEval21 [[dataset](https://github.com/germeval2021toxic/SharedTask/tree/main/Data%20Sets)] |
|
* Labels: toxicity |
|
* IWG Hatespeech dataset [[paper](https://arxiv.org/pdf/1701.08118.pdf), [dataset](https://github.com/UCSM-DUE/IWG_hatespeech_public)] |
|
* Labels: hate speech |
|
* Detecting Offensive Statements Towards Foreigners in Social Media (2017) by Breitschneider and Peters [[dataset](http://ub-web.de/research/)] |
|
* Labels: hate |
|
* HASOC: 2019 Hate Speech and Offensive Content [[dataset](https://hasocfire.github.io/hasoc/2019/index.html)] |
|
* Labels: offensive, profanity, hate |
|
|
|
The datasets contains different labels ranging from profanity, over hate speech to toxicity. In the combined dataset these labels were subsumed as `toxic` and `non-toxic` and contains 23,515 examples in total. |
|
|
|
Note that the datasets vary substantially in the number of examples. |
|
|
|
|
|
## Training Procedure |
|
|
|
The training and test set were created using either the predefined train/test splits where available and otherwise 80% of the examples for training and 20% for testing. This resulted in in 17,072 training examples and 6,443 test examples. |
|
|
|
The model was trained for 2 epochs with the following arguments: |
|
|
|
```python |
|
training_args = TrainingArguments( |
|
per_device_train_batch_size=batch_size, |
|
per_device_eval_batch_size=batch_size, |
|
num_train_epochs=2, |
|
evaluation_strategy="steps", |
|
logging_strategy="steps", |
|
logging_steps=100, |
|
save_total_limit=5, |
|
learning_rate=2e-5, |
|
weight_decay=0.01, |
|
metric_for_best_model='accuracy', |
|
load_best_model_at_end=True |
|
) |
|
``` |
|
|
|
## Evaluation Results |
|
|
|
Model evaluation was done on 1/10th of the dataset, which served as the test dataset. |
|
|
|
| Accuracy | F1 Score | Recall | Precision | |
|
| -------- | -------- | -------- | ----------- | |
|
| 78.50 | 50.34 | 39.22 | 70.27 | |
|
|