BPE-tokenizer / README.md
morten-j's picture
Create README.md
fac3fd5 verified
|
raw
history blame
112 Bytes

BPE based tokenizer used for the MEHDIE project and the training of a bilingual BERT model. Vocab size of 52000.