File size: 534 Bytes
a3bb12a de53e0c a3bb12a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# ELECTRA discriminator base
- pretrained with large Korean corpus datasets (30GB)
- 113M model parameters (followed google/electra-base-discriminator config)
- 35,000 vocab size
- trained for 1,000,000 steps
- built on [lassl](https://github.com/lassl/lassl) framework
pretrain-data
┣ korean_corpus.txt
┣ kowiki_latest.txt
┣ modu_dialogue_v1.2.txt
┣ modu_news_v1.1.txt
┣ modu_news_v2.0.txt
┣ modu_np_2021_v1.0.txt
┣ modu_np_v1.1.txt
┣ modu_spoken_v1.2.txt
┗ modu_written_v1.0.txt |