|
--- |
|
language: |
|
- bg |
|
metrics: |
|
- f1 |
|
- accuracy |
|
- precision |
|
- recall |
|
base_model: |
|
- rmihaylov/bert-base-bg |
|
pipeline_tag: text-classification |
|
license: apache-2.0 |
|
datasets: |
|
- sofia-uni/toxic-data-bg |
|
- wikimedia/wikipedia |
|
- oscar-corpus/oscar |
|
- petkopetkov/chitanka |
|
tags: |
|
- bert |
|
- not-for-all-audiences |
|
- medical |
|
--- |
|
Toxic language classification model of Bulgarian language, based on the [bert-base-bg](https://huggingface.co/rmihaylov/bert-base-bg) model. |
|
|
|
The model classifies between 4 classes: Toxic, MedicalTerminology, NonToxic, MinorityGroup. |
|
|
|
Classification report: |
|
|
|
| Accuracy | Precision | Recall | F1 Score | Loss Function | |
|
|----------|-----------|--------|----------|---------------| |
|
| 0.85 | 0.86 | 0.85 | 0.85 | 0.43 | |
|
|
|
More information [in the paper](https://www.researchgate.net/publication/388842558_Detecting_Toxic_Language_Ontology_and_BERT-based_Approaches_for_Bulgarian_Text). |
|
|
|
|
|
# Code and usage |
|
For training files and information how to use the model, refer to the [GitHub repository of the project](https://github.com/TsvetoslavVasev/toxic-language-classification). |
|
|
|
|
|
# Reference |
|
|
|
If you use this model in your academic project, please cite as: |
|
|
|
```bibtex |
|
@article |
|
{berbatova2025detecting, |
|
doi={10.13140/RG.2.2.34963.18723} |
|
title={Detecting Toxic Language: Ontology and BERT-based Approaches for Bulgarian Text}, |
|
author={Berbatova, Melania and Vasev, Tsvetoslav}, |
|
year={2025} |
|
} |
|
``` |