File size: 534 Bytes
a3bb12a
 
de53e0c
a3bb12a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# ELECTRA discriminator base
- pretrained with large Korean corpus datasets (30GB)
- 113M model parameters (followed google/electra-base-discriminator config)
- 35,000 vocab size
- trained for 1,000,000 steps 
- built on [lassl](https://github.com/lassl/lassl) framework  
  

pretrain-data  
 ┣ korean_corpus.txt  
 ┣ kowiki_latest.txt  
 ┣ modu_dialogue_v1.2.txt  
 ┣ modu_news_v1.1.txt  
 ┣ modu_news_v2.0.txt  
 ┣ modu_np_2021_v1.0.txt  
 ┣ modu_np_v1.1.txt  
 ┣ modu_spoken_v1.2.txt  
 ┗ modu_written_v1.0.txt