Update README.md
Browse files
README.md
CHANGED
@@ -32,7 +32,7 @@ The fine-tuned model achieves the following performance :
|
|
32 |
|
33 |
The ASR system is composed of:
|
34 |
- the **Tokenizer** (char) that transforms the input text into a sequence of characters ("cat" into ["c", "a", "t"]) and trained with the train transcriptions (train.tsv).
|
35 |
-
- the **Acoustic model** (wav2vec2.0 + DNN + CTC greedy decode). The pretrained wav2vec 2.0 model
|
36 |
The final acoustic representation is given to the CTC greedy decode.
|
37 |
|
38 |
We used recordings sampled at 16kHz (single channel).
|
@@ -86,7 +86,7 @@ We use the train / valid / test splits provided by CommonVoice, which correspond
|
|
86 |
|
87 |
### Training Procedure
|
88 |
|
89 |
-
We follow the training procedure provided in the
|
90 |
The `common_voice_prepare.py` script handles the preprocessing of the dataset.
|
91 |
|
92 |
#### Training Hyperparameters
|
@@ -97,9 +97,9 @@ Refer to the hyperparams.yaml file to get the hyperparameters information.
|
|
97 |
|
98 |
With 4xV100 32GB, the training took ~ 81 hours.
|
99 |
|
100 |
-
####
|
101 |
|
102 |
-
(
|
103 |
```bibtex
|
104 |
@misc{SB2021,
|
105 |
author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
|
|
|
32 |
|
33 |
The ASR system is composed of:
|
34 |
- the **Tokenizer** (char) that transforms the input text into a sequence of characters ("cat" into ["c", "a", "t"]) and trained with the train transcriptions (train.tsv).
|
35 |
+
- the **Acoustic model** (wav2vec2.0 + DNN + CTC greedy decode). The pretrained wav2vec 2.0 model [LeBenchmark/wav2vec2-FR-7K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large) is combined with two DNN layers and fine-tuned on CommonVoice FR.
|
36 |
The final acoustic representation is given to the CTC greedy decode.
|
37 |
|
38 |
We used recordings sampled at 16kHz (single channel).
|
|
|
86 |
|
87 |
### Training Procedure
|
88 |
|
89 |
+
We follow the training procedure provided in the [ASR-CTC speechbrain recipe](https://github.com/speechbrain/speechbrain/tree/develop/recipes/CommonVoice/ASR/CTC).
|
90 |
The `common_voice_prepare.py` script handles the preprocessing of the dataset.
|
91 |
|
92 |
#### Training Hyperparameters
|
|
|
97 |
|
98 |
With 4xV100 32GB, the training took ~ 81 hours.
|
99 |
|
100 |
+
#### Libraries
|
101 |
|
102 |
+
[Speechbrain](https://speechbrain.github.io/):
|
103 |
```bibtex
|
104 |
@misc{SB2021,
|
105 |
author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
|