sofia-uni
/

toxic-bert-bg

Text Classification

Not-For-All-Audiences

Model card Files Files and versions Community

toxic-bert-bg / README.md

melaniab's picture

Update README.md

8a4ae77 verified 27 days ago

|

history blame contribute delete

1.43 kB

	---
	language:
	- bg
	metrics:
	- f1
	- accuracy
	- precision
	- recall
	base_model:
	- rmihaylov/bert-base-bg
	pipeline_tag: text-classification
	license: apache-2.0
	datasets:
	- sofia-uni/toxic-data-bg
	- wikimedia/wikipedia
	- oscar-corpus/oscar
	- petkopetkov/chitanka
	tags:
	- bert
	- not-for-all-audiences
	- medical
	---
	Toxic language classification model of Bulgarian language, based on the [bert-base-bg](https://huggingface.co/rmihaylov/bert-base-bg) model.

	The model classifies between 4 classes: Toxic, MedicalTerminology, NonToxic, MinorityGroup.

	Classification report:

	\| Accuracy \| Precision \| Recall \| F1 Score \| Loss Function \|
	\|----------\|-----------\|--------\|----------\|---------------\|
	\| 0.85 \| 0.86 \| 0.85 \| 0.85 \| 0.43 \|

	More information [in the paper](https://www.researchgate.net/publication/388842558_Detecting_Toxic_Language_Ontology_and_BERT-based_Approaches_for_Bulgarian_Text).


	# Code and usage
	For training files and information how to use the model, refer to the [GitHub repository of the project](https://github.com/TsvetoslavVasev/toxic-language-classification).


	# Reference

	If you use this model in your academic project, please cite as:

	```bibtex
	@article
	{berbatova2025detecting,
	doi={10.13140/RG.2.2.34963.18723}
	title={Detecting Toxic Language: Ontology and BERT-based Approaches for Bulgarian Text},
	author={Berbatova, Melania and Vasev, Tsvetoslav},
	year={2025}
	}
	```