File size: 1,411 Bytes
8eaf2b1
 
 
 
 
 
 
 
 
cdc399b
 
4847aac
 
9d662c5
 
4847aac
 
952d7b7
bf518e2
4847aac
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
---
title: README
emoji: πŸ“‰
colorFrom: green
colorTo: red
sdk: static
pinned: false
---

**Text Classification datasets and models for Uktainian**

We release datasets and models for Ukrainian covering several classification domains: toxicity, NLI, and formality.

πŸ“° [Toloka BlogPost on Toxicity Classification in Ukrainian](https://toloka.ai/blog/toxicity-detection-for-non-mainstream-languages-why-we-still-need-human-labeled-data/)

**Corresponding papers**

**[EMNLP2025]** *Part of SemEval2025 Emotion Detection Shared Task*; Daryna Dementieva, Nikolay Babakov, and Alexander Fraser. [EmoBench-UA: A Benchmark Dataset for Emotion Detection in Ukrainian](https://arxiv.org/abs/2505.23297). EMNLP2025, Findings.

**[COLING2025]** Daryna Dementieva, Valeriia Khylenko, and Georg Groh. 2025. [Cross-lingual Text Classification Transfer: The Case of Ukrainian](https://aclanthology.org/2025.coling-main.97/). In Proceedings of the 31st International Conference on Computational Linguistics, pages 1451–1464, Abu Dhabi, UAE. Association for Computational Linguistics.

**[NAACL2024, WOAH]** Daryna Dementieva, Valeriia Khylenko, Nikolay Babakov, and Georg Groh. 2024. [Toxicity Classification in Ukrainian](https://aclanthology.org/2024.woah-1.19/). In Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024), pages 244–255, Mexico City, Mexico. Association for Computational Linguistics.