metadata
license: mit
base_model: coppercitylabs/uzbert-base-uncased
tags:
- generated_from_trainer
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: uzpostagger-cyrillic-3
results: []
uzpostagger-cyrillic-3
This model is a fine-tuned version of coppercitylabs/uzbert-base-uncased on uzbekpos dataset. It achieves the following results on the evaluation set:
- Loss: 0.2715
- Precision: 0.8763
- Recall: 0.8699
- F1: 0.8731
- Accuracy: 0.9219
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5
Training results
Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
---|---|---|---|---|---|---|---|
No log | 1.0 | 25 | 0.8765 | 0.6558 | 0.5477 | 0.5969 | 0.7485 |
No log | 2.0 | 50 | 0.4086 | 0.8496 | 0.8237 | 0.8364 | 0.9004 |
No log | 3.0 | 75 | 0.3133 | 0.8615 | 0.8552 | 0.8583 | 0.9142 |
No log | 4.0 | 100 | 0.2806 | 0.8730 | 0.8657 | 0.8693 | 0.9193 |
No log | 5.0 | 125 | 0.2715 | 0.8763 | 0.8699 | 0.8731 | 0.9219 |
Framework versions
- Transformers 4.32.1
- Pytorch 2.2.0
- Datasets 2.17.1
- Tokenizers 0.13.3
Citation Information
@inproceedings{bobojonova-etal-2025-bbpos,
title = "{BBPOS}: {BERT}-based Part-of-Speech Tagging for {U}zbek",
author = "Bobojonova, Latofat and
Akhundjanova, Arofat and
Ostheimer, Phil Sidney and
Fellenz, Sophie",
editor = "Hettiarachchi, Hansi and
Ranasinghe, Tharindu and
Rayson, Paul and
Mitkov, Ruslan and
Gaber, Mohamed and
Premasiri, Damith and
Tan, Fiona Anting and
Uyangodage, Lasitha",
booktitle = "Proceedings of the First Workshop on Language Models for Low-Resource Languages",
month = jan,
year = "2025",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.loreslm-1.23/",
pages = "287--293",
abstract = "This paper advances NLP research for the low-resource Uzbek language by evaluating two previously untested monolingual Uzbek BERT models on the part-of-speech (POS) tagging task and introducing the first publicly available UPOS-tagged benchmark dataset for Uzbek. Our fine-tuned models achieve 91{\%} average accuracy, outperforming the baseline multi-lingual BERT as well as the rule-based tagger. Notably, these models capture intermediate POS changes through affixes and demonstrate context sensitivity, unlike existing rule-based taggers."
}