cecilemacaire commited on
Commit
6213802
·
verified ·
1 Parent(s): 4a5329b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -33
README.md CHANGED
@@ -23,48 +23,42 @@ tags:
23
 
24
  *asr-wav2vec2-commonvoice-15-fr* is an Automatic Speech Recognition model fine-tuned on CommonVoice 15.0 French set with *LeBenchmark/wav2vec2-FR-7K-large* as the pretrained wav2vec2 model.
25
 
26
- ## Model Details
27
-
28
-
 
 
 
29
 
30
- ### Model Description
31
-
32
- <!-- Provide a longer summary of what this model is. -->
33
 
 
 
 
 
34
 
 
35
 
36
  - **Developed by:** Cécile Macaire
37
  - **Funded by [optional]:** GENCI-IDRIS (Grant 2023-AD011013625R1)
38
  PROPICTO ANR-20-CE93-0005
39
  - **Language(s) (NLP):** French
40
  - **License:** Apache-2.0
41
- - **Finetuned from model [optional]:** LeBenchmark/wav2vec2-FR-7K-large
42
-
43
- ### Model Sources [optional]
44
-
45
- <!-- Provide the basic links for the model. -->
46
-
47
- - **Repository:** https://github.com/macairececile/speech-to-pictograms.
48
- - **Paper [optional]:**
49
 
50
 
51
  ## How to Get Started with the Model
52
 
53
- Use the code below to get started with the model.
54
-
55
- [More Information Needed]
56
-
57
  ## Training Details
58
 
 
59
  ### Training Data
60
 
61
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
62
 
63
- [More Information Needed]
64
 
65
  ### Training Procedure
66
 
67
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
68
 
69
  #### Preprocessing [optional]
70
 
@@ -105,12 +99,6 @@ Use the code below to get started with the model.
105
 
106
  #### Summary
107
 
108
-
109
-
110
- ## Model Examination [optional]
111
-
112
- <!-- Relevant interpretability work for the model goes here -->
113
-
114
  [More Information Needed]
115
 
116
  ## Environmental Impact
@@ -143,12 +131,9 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
143
 
144
  [More Information Needed]
145
 
146
- ## Citation [optional]
147
-
148
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
149
-
150
- **BibTeX:**
151
 
 
152
  @inproceedings{macaire24_interspeech,
153
  title = {Towards Speech-to-Pictograms Translation},
154
  author = {Cécile Macaire and Chloé Dion and Didier Schwab and Benjamin Lecouteux and Emmanuelle Esperança-Rodier},
@@ -157,4 +142,5 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
157
  pages = {857--861},
158
  doi = {10.21437/Interspeech.2024-490},
159
  issn = {2958-1796},
160
- }
 
 
23
 
24
  *asr-wav2vec2-commonvoice-15-fr* is an Automatic Speech Recognition model fine-tuned on CommonVoice 15.0 French set with *LeBenchmark/wav2vec2-FR-7K-large* as the pretrained wav2vec2 model.
25
 
26
+ The fine-tuned model achieves the following performance :
27
+ |:-------------:|:--------------:|:--------------:| :--------:|
28
+ | Release | Valid WER | Test WER | GPUs |
29
+ |:-------------:|:--------------:|:--------------:| :--------:|
30
+ | 2023-09-08 | 9.14 | 11.21 | 4xV100 32GB |
31
+ |:-------------:|:--------------:|:--------------:| :--------:|
32
 
33
+ ## Model Details
 
 
34
 
35
+ The ASR system is composed of:
36
+ - the **Tokenizer** (char) that transforms the input text into a sequence of characters ("cat" into ["c", "a", "t"]) and trained with the train transcriptions (train.tsv).
37
+ - the **Acoustic model** (wav2vec2.0 + DNN + CTC greedy decode). The pretrained wav2vec 2.0 model (LeBenchmark/wav2vec2-FR-7K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large) is combined with two DNN layers and fine-tuned on CommonVoice FR.
38
+ The final acoustic representation is given to the CTC greedy decode.
39
 
40
+ We used recordings sampled at 16kHz (single channel).
41
 
42
  - **Developed by:** Cécile Macaire
43
  - **Funded by [optional]:** GENCI-IDRIS (Grant 2023-AD011013625R1)
44
  PROPICTO ANR-20-CE93-0005
45
  - **Language(s) (NLP):** French
46
  - **License:** Apache-2.0
47
+ - **Finetuned from model:** LeBenchmark/wav2vec2-FR-7K-large
 
 
 
 
 
 
 
48
 
49
 
50
  ## How to Get Started with the Model
51
 
 
 
 
 
52
  ## Training Details
53
 
54
+
55
  ### Training Data
56
 
 
57
 
 
58
 
59
  ### Training Procedure
60
 
61
+
62
 
63
  #### Preprocessing [optional]
64
 
 
99
 
100
  #### Summary
101
 
 
 
 
 
 
 
102
  [More Information Needed]
103
 
104
  ## Environmental Impact
 
131
 
132
  [More Information Needed]
133
 
134
+ ## Citation
 
 
 
 
135
 
136
+ ```bibtex
137
  @inproceedings{macaire24_interspeech,
138
  title = {Towards Speech-to-Pictograms Translation},
139
  author = {Cécile Macaire and Chloé Dion and Didier Schwab and Benjamin Lecouteux and Emmanuelle Esperança-Rodier},
 
142
  pages = {857--861},
143
  doi = {10.21437/Interspeech.2024-490},
144
  issn = {2958-1796},
145
+ }
146
+ ```