Commit
·
a9fee4d
1
Parent(s):
0803e25
warning
Browse files
README.md
CHANGED
|
@@ -7,6 +7,8 @@ language: dv
|
|
| 7 |
Pretrained from scratch on Dhivei (language of the Maldives)
|
| 8 |
with ByT5, Google's new byte-level tokenizer strategy.
|
| 9 |
|
|
|
|
|
|
|
| 10 |
Corpus: Sofwath's Dhivehi corpus https://github.com/Sofwath/DhivehiDatasets
|
| 11 |
|
| 12 |
Pretraining Notebook:
|
|
@@ -17,3 +19,7 @@ https://colab.research.google.com/drive/1ERIZ1PyHn-yN_jo7dTQeODn22vrt-d1d?usp=sh
|
|
| 17 |
On Dhivehi news classification task
|
| 18 |
|
| 19 |
https://colab.research.google.com/drive/11u5SafR4bKICmArgDl6KQ9vqfYtDpyWp?usp=sharing
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
Pretrained from scratch on Dhivei (language of the Maldives)
|
| 8 |
with ByT5, Google's new byte-level tokenizer strategy.
|
| 9 |
|
| 10 |
+
**Use byt5-dv for now; this is less accurate**
|
| 11 |
+
|
| 12 |
Corpus: Sofwath's Dhivehi corpus https://github.com/Sofwath/DhivehiDatasets
|
| 13 |
|
| 14 |
Pretraining Notebook:
|
|
|
|
| 19 |
On Dhivehi news classification task
|
| 20 |
|
| 21 |
https://colab.research.google.com/drive/11u5SafR4bKICmArgDl6KQ9vqfYtDpyWp?usp=sharing
|
| 22 |
+
|
| 23 |
+
## Issues
|
| 24 |
+
|
| 25 |
+
There was an issue with the vocabulary size, final layer, and/or accuracy on fine-tuning.
|