|
--- |
|
license: apache-2.0 |
|
pipeline_tag: fill-mask |
|
--- |
|
|
|
|
|
# manchuBERT |
|
|
|
manchuBERT is a BERT-base model trained with romanized Manchu data from scratch. |
|
[ManNER & ManPOS](https://aclanthology.org/2024.lrec-main.961.pdf) are fine-tuned manchuBERT models. |
|
|
|
|
|
## Data |
|
|
|
manchuBERT utilizes the data augmentation method from [Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data](https://arxiv.org/pdf/2311.17492.pdf). |
|
|
|
| **Data** | **Number of Sentences(before augmentation)** | |
|
|:---------------------------:|:-----------------------:| |
|
| Manwén Lˇaodàng–Taizong | 2,220 | |
|
| Ilan gurun i bithe | 41,904 | |
|
| Gin ping mei bithe | 21,376 | |
|
| Yùzhì Q¯ıngwénjiàn | 11,954 | |
|
| Yùzhì Zengdìng Q¯ıngwénjiàn | 18,420 | |
|
| Manwén Lˇaodàng–Taizu | 22,578 | |
|
| Manchu-Korean Dictionary | 40,583 | |
|
|
|
## Citation |
|
``` |
|
@misc {jean_seo_2024, |
|
author = { {Jean Seo} }, |
|
title = { manchuBERT (Revision 64133be) }, |
|
year = 2024, |
|
url = { https://huggingface.co/seemdog/manchuBERT }, |
|
doi = { 10.57967/hf/1599 }, |
|
publisher = { Hugging Face } |
|
} |
|
``` |
|
|
|
|
|
|