File size: 112 Bytes
fac3fd5
 
1
2
BPE based tokenizer used for the MEHDIE project and the training of a bilingual BERT model.
Vocab size of 52000.