Propicto
/

asr-wav2vec2-commonvoice-15-fr

@@ -23,48 +23,42 @@ tags:
 *asr-wav2vec2-commonvoice-15-fr* is an Automatic Speech Recognition model fine-tuned on CommonVoice 15.0 French set with *LeBenchmark/wav2vec2-FR-7K-large* as the pretrained wav2vec2 model.
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
 - **Developed by:** Cécile Macaire
 - **Funded by [optional]:** GENCI-IDRIS (Grant 2023-AD011013625R1)
 PROPICTO ANR-20-CE93-0005
 - **Language(s) (NLP):** French
 - **License:** Apache-2.0
-- **Finetuned from model [optional]:** LeBenchmark/wav2vec2-FR-7K-large
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** https://github.com/macairececile/speech-to-pictograms.
-- **Paper [optional]:**
 ## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
 ### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Preprocessing [optional]
@@ -105,12 +99,6 @@ Use the code below to get started with the model.
 #### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
 [More Information Needed]
 ## Environmental Impact
@@ -143,12 +131,9 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 [More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
 @inproceedings{macaire24_interspeech,
   title     = {Towards Speech-to-Pictograms Translation},
   author    = {Cécile Macaire and Chloé Dion and Didier Schwab and Benjamin Lecouteux and Emmanuelle Esperança-Rodier},
@@ -157,4 +142,5 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
   pages     = {857--861},
   doi       = {10.21437/Interspeech.2024-490},
   issn      = {2958-1796},
-}

 *asr-wav2vec2-commonvoice-15-fr* is an Automatic Speech Recognition model fine-tuned on CommonVoice 15.0 French set with *LeBenchmark/wav2vec2-FR-7K-large* as the pretrained wav2vec2 model.
+The fine-tuned model achieves the following performance :
+|:-------------:|:--------------:|:--------------:| :--------:|
+| Release | Valid WER | Test WER | GPUs |
+|:-------------:|:--------------:|:--------------:| :--------:|
+| 2023-09-08 | 9.14  | 11.21  | 4xV100 32GB |
+|:-------------:|:--------------:|:--------------:| :--------:|
+## Model Details
+The ASR system is composed of:
+- the **Tokenizer** (char) that transforms the input text into a sequence of characters ("cat" into ["c", "a", "t"]) and trained with the train transcriptions (train.tsv).
+- the **Acoustic model** (wav2vec2.0 + DNN + CTC greedy decode). The pretrained wav2vec 2.0 model (LeBenchmark/wav2vec2-FR-7K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large) is combined with two DNN layers and fine-tuned on CommonVoice FR.
+The final acoustic representation is given to the CTC greedy decode.
+We used recordings sampled at 16kHz (single channel).
 - **Developed by:** Cécile Macaire
 - **Funded by [optional]:** GENCI-IDRIS (Grant 2023-AD011013625R1)
 PROPICTO ANR-20-CE93-0005
 - **Language(s) (NLP):** French
 - **License:** Apache-2.0
+- **Finetuned from model:** LeBenchmark/wav2vec2-FR-7K-large
 ## How to Get Started with the Model
 ## Training Details
 ### Training Data
 ### Training Procedure
 #### Preprocessing [optional]
 #### Summary
 [More Information Needed]
 ## Environmental Impact
 [More Information Needed]
+## Citation
+```bibtex
 @inproceedings{macaire24_interspeech,
   title     = {Towards Speech-to-Pictograms Translation},
   author    = {Cécile Macaire and Chloé Dion and Didier Schwab and Benjamin Lecouteux and Emmanuelle Esperança-Rodier},
   pages     = {857--861},
   doi       = {10.21437/Interspeech.2024-490},
   issn      = {2958-1796},
+}
+```