indiejoseph
commited on
Commit
·
ef297d9
1
Parent(s):
9dbc0f7
Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,10 @@ should probably proofread and complete it, then remove this comment. -->
|
|
14 |
|
15 |
# bart-base-cantonese
|
16 |
|
17 |
-
This model is a
|
|
|
|
|
|
|
18 |
It achieves the following results on the evaluation set:
|
19 |
- Loss: 0.8513
|
20 |
- Accuracy: 0.8363
|
|
|
14 |
|
15 |
# bart-base-cantonese
|
16 |
|
17 |
+
This model is a continue pre-train version of [fnlp/bart-base-chinese](https://huggingface.co/fnlp/bart-base-chinese) on [indiejoseph/cc100-yue](https://huggingface.co/datasets/indiejoseph/cc100-yue) with typo correction and filtered.
|
18 |
+
|
19 |
+
This tokenizer has extended the Bert tokenizer from fnlp/bart-base-chinese with 500 more Chinese characters commonly found in Cantonese
|
20 |
+
|
21 |
It achieves the following results on the evaluation set:
|
22 |
- Loss: 0.8513
|
23 |
- Accuracy: 0.8363
|