Fill-Mask
Transformers
PyTorch
bert
Inference Endpoints
manchuBERT / README.md
seemdog's picture
Update README.md
e22abe9 verified
---
license: apache-2.0
pipeline_tag: fill-mask
---
# manchuBERT
manchuBERT is a BERT-base model trained with romanized Manchu data from scratch.
[ManNER & ManPOS](https://aclanthology.org/2024.lrec-main.961.pdf) are fine-tuned manchuBERT models.
## Data
manchuBERT utilizes the data augmentation method from [Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data](https://arxiv.org/pdf/2311.17492.pdf).
| **Data** | **Number of Sentences(before augmentation)** |
|:---------------------------:|:-----------------------:|
| Manwén Lˇaodàng–Taizong | 2,220 |
| Ilan gurun i bithe | 41,904 |
| Gin ping mei bithe | 21,376 |
| Yùzhì Q¯ıngwénjiàn | 11,954 |
| Yùzhì Zengdìng Q¯ıngwénjiàn | 18,420 |
| Manwén Lˇaodàng–Taizu | 22,578 |
| Manchu-Korean Dictionary | 40,583 |
## Citation
```
@misc {jean_seo_2024,
author = { {Jean Seo} },
title = { manchuBERT (Revision 64133be) },
year = 2024,
url = { https://huggingface.co/seemdog/manchuBERT },
doi = { 10.57967/hf/1599 },
publisher = { Hugging Face }
}
```