seemdog
/

manchuBERT

Inference Endpoints

Model card Files Files and versions Community

manchuBERT / README.md

seemdog's picture

Update README.md

e22abe9 verified 7 months ago

|

history blame contribute delete

1.29 kB

	---
	license: apache-2.0
	pipeline_tag: fill-mask
	---


	# manchuBERT

	manchuBERT is a BERT-base model trained with romanized Manchu data from scratch.
	[ManNER & ManPOS](https://aclanthology.org/2024.lrec-main.961.pdf) are fine-tuned manchuBERT models.


	## Data

	manchuBERT utilizes the data augmentation method from [Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data](https://arxiv.org/pdf/2311.17492.pdf).

	\| Data \| Number of Sentences(before augmentation) \|
	\|:---------------------------:\|:-----------------------:\|
	\| Manwén Lˇaodàng–Taizong \| 2,220 \|
	\| Ilan gurun i bithe \| 41,904 \|
	\| Gin ping mei bithe \| 21,376 \|
	\| Yùzhì Q¯ıngwénjiàn \| 11,954 \|
	\| Yùzhì Zengdìng Q¯ıngwénjiàn \| 18,420 \|
	\| Manwén Lˇaodàng–Taizu \| 22,578 \|
	\| Manchu-Korean Dictionary \| 40,583 \|

	## Citation
	```
	@misc {jean_seo_2024,
	author = { {Jean Seo} },
	title = { manchuBERT (Revision 64133be) },
	year = 2024,
	url = { https://huggingface.co/seemdog/manchuBERT },
	doi = { 10.57967/hf/1599 },
	publisher = { Hugging Face }
	}
	```