cecilemacaire commited on
Commit
fa5cb43
·
verified ·
1 Parent(s): cbfa777

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -32,7 +32,7 @@ The fine-tuned model achieves the following performance :
32
 
33
  The ASR system is composed of:
34
  - the **Tokenizer** (char) that transforms the input text into a sequence of characters ("cat" into ["c", "a", "t"]) and trained with the train transcriptions (train.tsv).
35
- - the **Acoustic model** (wav2vec2.0 + DNN + CTC greedy decode). The pretrained wav2vec 2.0 model (LeBenchmark/wav2vec2-FR-7K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large) is combined with two DNN layers and fine-tuned on CommonVoice FR.
36
  The final acoustic representation is given to the CTC greedy decode.
37
 
38
  We used recordings sampled at 16kHz (single channel).
@@ -86,7 +86,7 @@ We use the train / valid / test splits provided by CommonVoice, which correspond
86
 
87
  ### Training Procedure
88
 
89
- We follow the training procedure provided in the (ASR-CTC speechbrain recipe)[https://github.com/speechbrain/speechbrain/tree/develop/recipes/CommonVoice/ASR/CTC].
90
  The `common_voice_prepare.py` script handles the preprocessing of the dataset.
91
 
92
  #### Training Hyperparameters
@@ -97,9 +97,9 @@ Refer to the hyperparams.yaml file to get the hyperparameters information.
97
 
98
  With 4xV100 32GB, the training took ~ 81 hours.
99
 
100
- #### Software
101
 
102
- (Speechbrain)[https://speechbrain.github.io/]:
103
  ```bibtex
104
  @misc{SB2021,
105
  author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
 
32
 
33
  The ASR system is composed of:
34
  - the **Tokenizer** (char) that transforms the input text into a sequence of characters ("cat" into ["c", "a", "t"]) and trained with the train transcriptions (train.tsv).
35
+ - the **Acoustic model** (wav2vec2.0 + DNN + CTC greedy decode). The pretrained wav2vec 2.0 model [LeBenchmark/wav2vec2-FR-7K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large) is combined with two DNN layers and fine-tuned on CommonVoice FR.
36
  The final acoustic representation is given to the CTC greedy decode.
37
 
38
  We used recordings sampled at 16kHz (single channel).
 
86
 
87
  ### Training Procedure
88
 
89
+ We follow the training procedure provided in the [ASR-CTC speechbrain recipe](https://github.com/speechbrain/speechbrain/tree/develop/recipes/CommonVoice/ASR/CTC).
90
  The `common_voice_prepare.py` script handles the preprocessing of the dataset.
91
 
92
  #### Training Hyperparameters
 
97
 
98
  With 4xV100 32GB, the training took ~ 81 hours.
99
 
100
+ #### Libraries
101
 
102
+ [Speechbrain](https://speechbrain.github.io/):
103
  ```bibtex
104
  @misc{SB2021,
105
  author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },