cahya commited on
Commit
a49c02e
·
verified ·
1 Parent(s): 6d53094

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -6
README.md CHANGED
@@ -1,10 +1,11 @@
1
  ---
2
- language: "id"
3
- license: "mit"
4
  datasets:
5
- - Indonesian Wikipedia
6
- widget:
7
- - text: "Pulau Dewata sering dikunjungi"
 
 
8
  ---
9
 
10
  # Indonesian GPT2 small model
@@ -61,4 +62,4 @@ output = model(encoded_input)
61
 
62
  This model was pre-trained with 522MB of indonesian Wikipedia.
63
  The texts are tokenized using a byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and
64
- a vocabulary size of 52,000. The inputs are sequences of 128 consecutive tokens.
 
1
  ---
2
+ license: mit
 
3
  datasets:
4
+ - indonesian-nlp/wikipedia-id
5
+ language:
6
+ - id
7
+ metrics:
8
+ - perplexity
9
  ---
10
 
11
  # Indonesian GPT2 small model
 
62
 
63
  This model was pre-trained with 522MB of indonesian Wikipedia.
64
  The texts are tokenized using a byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and
65
+ a vocabulary size of 52,000. The inputs are sequences of 128 consecutive tokens.