dalat5 / src /data /generate_clean_corpus.sh
crossroderick's picture
Pre-v4 readme and support files update
252a85f
raw
history blame contribute delete
116 Bytes
shuf kazakh_latin_corpus.jsonl -o kazakh_latin_corpus.jsonl
grep '\S' kazakh_latin_corpus.jsonl > clean_corpus.jsonl