Update README.md
Browse files
README.md
CHANGED
|
@@ -56,18 +56,18 @@ model-index:
|
|
| 56 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 57 |
should probably proofread and complete it, then remove this comment. -->
|
| 58 |
|
| 59 |
-
#
|
| 60 |
|
| 61 |
-
This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the [Common Voice 8.0 - Romanian subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) dataset, with extra training data from [Romanian Speech Synthesis](https://huggingface.co/datasets/gigant/romanian_speech_synthesis_0_8_1) dataset.
|
| 62 |
|
| 63 |
-
|
| 64 |
- Loss: 0.1553
|
| 65 |
- Wer: 0.1174
|
| 66 |
- Cer: 0.0294
|
| 67 |
|
| 68 |
## Model description
|
| 69 |
|
| 70 |
-
|
| 71 |
|
| 72 |
## Intended uses & limitations
|
| 73 |
|
|
@@ -75,7 +75,12 @@ More information needed
|
|
| 75 |
|
| 76 |
## Training and evaluation data
|
| 77 |
|
| 78 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
|
| 80 |
## Training procedure
|
| 81 |
|
|
|
|
| 56 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 57 |
should probably proofread and complete it, then remove this comment. -->
|
| 58 |
|
| 59 |
+
# Romanian Wav2Vec2
|
| 60 |
|
| 61 |
+
This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the [Common Voice 8.0 - Romanian subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) dataset (train + validation + other splits), with extra training data from [Romanian Speech Synthesis](https://huggingface.co/datasets/gigant/romanian_speech_synthesis_0_8_1) dataset (train + test splits).
|
| 62 |
|
| 63 |
+
Without the 5-gram Language Model optimization, it achieves the following results on the evaluation set (Common Voice 8.0, Romanian subset, test split):
|
| 64 |
- Loss: 0.1553
|
| 65 |
- Wer: 0.1174
|
| 66 |
- Cer: 0.0294
|
| 67 |
|
| 68 |
## Model description
|
| 69 |
|
| 70 |
+
The architecture is based on [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) with a speech recognition CTC head and an added 5-gram language model (using [pyctcdecode](https://github.com/kensho-technologies/pyctcdecode) and [kenlm](https://github.com/kpu/kenlm)). Those libraries are needed in order for the language model-boosted decoder to work.
|
| 71 |
|
| 72 |
## Intended uses & limitations
|
| 73 |
|
|
|
|
| 75 |
|
| 76 |
## Training and evaluation data
|
| 77 |
|
| 78 |
+
Training data :
|
| 79 |
+
- [Common Voice 8.0 - Romanian subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) : train + validation + other splits
|
| 80 |
+
- [Romanian Speech Synthesis](https://huggingface.co/datasets/gigant/romanian_speech_synthesis_0_8_1) : train + test splits
|
| 81 |
+
|
| 82 |
+
Evaluation data :
|
| 83 |
+
- [Common Voice 8.0 - Romanian subset](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) : test split
|
| 84 |
|
| 85 |
## Training procedure
|
| 86 |
|