dalat5 / src /data /get_data.sh
crossroderick's picture
Updated the readme
b5c4208
raw
history blame contribute delete
245 Bytes
wget https://dumps.wikimedia.org/kkwiki/latest/kkwiki-latest-pages-articles.xml.bz2
wget http://data.statmt.org/cc-100/kk.txt.xz
unxz kk.txt.xz
python3 -m wikiextractor.WikiExtractor kkwiki-latest-pages-articles.xml.bz2 --output extracted --json