Update model card

92d1f1c over 2 years ago

3.97 kB

	---
	language:
	- de
	tags:
	- distilbert
	- german
	- classification
	datasets:
	- germeval21
	widget:
	- text: "Das ist ein guter Punkt, so hatte ich das noch nicht betrachtet."
	example_title: "Agreement (non-toxic)"
	- text: "Wow, was ein geiles Spiel. Glückwunsch."
	example_title: "Football (non-toxic)"
	- text: "Halt deine scheiß Fresse, du Arschloch"
	example_title: "Silence (toxic)"
	- text: "Verpiss dich, du dreckiger Hurensohn."
	example_title: "Dismiss (toxic)"
	---

	# German Toxic Comment Classification

	## Model Description

	This model was created with the purpose to detect toxic or potentially harmful comments.

	For this model, we fine-tuned a German DistilBERT model [distilbert-base-german-cased](https://huggingface.co/distilbert-base-german-cased) on a combination of five German datasets containing toxicity, profanity, offensive, or hate speech.


	## Intended Uses & Limitations

	This model can be used to detect toxicity in German comments.
	However, the definition of toxicity is vague and the model might not be able to detect all instances of toxicity.

	It will not be able to detect toxicity in languages other than German.


	## How to Use

	```python
	from transformers import pipeline

	model_hub_url = 'https://huggingface.co/ml6team/distilbert-base-german-cased-toxic-comments'
	model_name = 'ml6team/distilbert-base-german-cased-toxic-comments'

	toxicity_pipeline = pipeline('text-classification', model=model_name, tokenizer=model_name)

	comment = "Ein harmloses Beispiel"
	result = toxicity_pipeline(comment)[0]
	print(f"Comment: {comment}\nLabel: {result['label']}, score: {result['score']}")
	```


	## Limitations and Bias

	The model was trained on a combinations of datasets that contain examples gathered from different social networks and internet communities. This only represents a narrow subset of possible instances of toxicity and instances in other domains might not be detected reliably.


	## Training Data

	The training dataset combines the following five datasets:

	* GermEval18 [[dataset](https://github.com/uds-lsv/GermEval-2018-Data)]
	* Labels: abuse, profanity, toxicity
	* GermEval21 [[dataset](https://github.com/germeval2021toxic/SharedTask/tree/main/Data%20Sets)]
	* Labels: toxicity
	* IWG Hatespeech dataset [[paper](https://arxiv.org/pdf/1701.08118.pdf), [dataset](https://github.com/UCSM-DUE/IWG_hatespeech_public)]
	* Labels: hate speech
	* Detecting Offensive Statements Towards Foreigners in Social Media (2017) by Breitschneider and Peters [[dataset](http://ub-web.de/research/)]
	* Labels: hate
	* HASOC: 2019 Hate Speech and Offensive Content [[dataset](https://hasocfire.github.io/hasoc/2019/index.html)]
	* Labels: offensive, profanity, hate

	The datasets contains different labels ranging from profanity, over hate speech to toxicity. In the combined dataset these labels were subsumed as `toxic` and `non-toxic` and contains 23,515 examples in total.

	Note that the datasets vary substantially in the number of examples.


	## Training Procedure

	The training and test set were created using either the predefined train/test splits where available and otherwise 80% of the examples for training and 20% for testing. This resulted in in 17,072 training examples and 6,443 test examples.

	The model was trained for 2 epochs with the following arguments:

	```python
	training_args = TrainingArguments(
	per_device_train_batch_size=batch_size,
	per_device_eval_batch_size=batch_size,
	num_train_epochs=2,
	evaluation_strategy="steps",
	logging_strategy="steps",
	logging_steps=100,
	save_total_limit=5,
	learning_rate=2e-5,
	weight_decay=0.01,
	metric_for_best_model='accuracy',
	load_best_model_at_end=True
	)
	```

	## Evaluation Results

	Model evaluation was done on 1/10th of the dataset, which served as the test dataset.

	\| Accuracy \| F1 Score \| Recall \| Precision \|
	\| -------- \| -------- \| -------- \| ----------- \|
	\| 78.50 \| 50.34 \| 39.22 \| 70.27 \|