Update README.md
Browse files
README.md
CHANGED
@@ -1,10 +1,11 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
license: "mit"
|
4 |
datasets:
|
5 |
-
-
|
6 |
-
|
7 |
-
-
|
|
|
|
|
8 |
---
|
9 |
|
10 |
# Indonesian GPT2 small model
|
@@ -61,4 +62,4 @@ output = model(encoded_input)
|
|
61 |
|
62 |
This model was pre-trained with 522MB of indonesian Wikipedia.
|
63 |
The texts are tokenized using a byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and
|
64 |
-
a vocabulary size of 52,000. The inputs are sequences of 128 consecutive tokens.
|
|
|
1 |
---
|
2 |
+
license: mit
|
|
|
3 |
datasets:
|
4 |
+
- indonesian-nlp/wikipedia-id
|
5 |
+
language:
|
6 |
+
- id
|
7 |
+
metrics:
|
8 |
+
- perplexity
|
9 |
---
|
10 |
|
11 |
# Indonesian GPT2 small model
|
|
|
62 |
|
63 |
This model was pre-trained with 522MB of indonesian Wikipedia.
|
64 |
The texts are tokenized using a byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and
|
65 |
+
a vocabulary size of 52,000. The inputs are sequences of 128 consecutive tokens.
|