Propicto
/

asr-wav2vec2-commonvoice-15-fr

Automatic Speech Recognition

Model card Files Files and versions Community

cecilemacaire commited on Jan 15

Commit

fa5cb43

·

verified ·

1 Parent(s): cbfa777

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -32,7 +32,7 @@ The fine-tuned model achieves the following performance :
 The ASR system is composed of:
 - the **Tokenizer** (char) that transforms the input text into a sequence of characters ("cat" into ["c", "a", "t"]) and trained with the train transcriptions (train.tsv).
-- the **Acoustic model** (wav2vec2.0 + DNN + CTC greedy decode). The pretrained wav2vec 2.0 model (LeBenchmark/wav2vec2-FR-7K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large) is combined with two DNN layers and fine-tuned on CommonVoice FR.
 The final acoustic representation is given to the CTC greedy decode.
 We used recordings sampled at 16kHz (single channel).
@@ -86,7 +86,7 @@ We use the train / valid / test splits provided by CommonVoice, which correspond
 ### Training Procedure
-We follow the training procedure provided in the (ASR-CTC speechbrain recipe)[https://github.com/speechbrain/speechbrain/tree/develop/recipes/CommonVoice/ASR/CTC].
 The `common_voice_prepare.py` script handles the preprocessing of the dataset.
 #### Training Hyperparameters
@@ -97,9 +97,9 @@ Refer to the hyperparams.yaml file to get the hyperparameters information.
 With 4xV100 32GB, the training took ~ 81 hours.
-#### Software
-(Speechbrain)[https://speechbrain.github.io/]:
 ```bibtex
 @misc{SB2021,
     author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },

 The ASR system is composed of:
 - the **Tokenizer** (char) that transforms the input text into a sequence of characters ("cat" into ["c", "a", "t"]) and trained with the train transcriptions (train.tsv).
+- the **Acoustic model** (wav2vec2.0 + DNN + CTC greedy decode). The pretrained wav2vec 2.0 model [LeBenchmark/wav2vec2-FR-7K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large) is combined with two DNN layers and fine-tuned on CommonVoice FR.
 The final acoustic representation is given to the CTC greedy decode.
 We used recordings sampled at 16kHz (single channel).
 ### Training Procedure
+We follow the training procedure provided in the [ASR-CTC speechbrain recipe](https://github.com/speechbrain/speechbrain/tree/develop/recipes/CommonVoice/ASR/CTC).
 The `common_voice_prepare.py` script handles the preprocessing of the dataset.
 #### Training Hyperparameters
 With 4xV100 32GB, the training took ~ 81 hours.
+#### Libraries
+[Speechbrain](https://speechbrain.github.io/):
 ```bibtex
 @misc{SB2021,
     author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },