--- license: apache-2.0 datasets: - mozilla-foundation/common_voice_15_0 language: - fr metrics: - wer base_model: - LeBenchmark/wav2vec2-FR-7K-large pipeline_tag: automatic-speech-recognition library_name: speechbrain tags: - Transformer - wav2vec2 - CTC - inference --- # asr-wav2vec2-commonvoice-15-fr : LeBenchmark/wav2vec2-FR-7K-large fine-tuned on CommonVoice 15.0 French *asr-wav2vec2-commonvoice-15-fr* is an Automatic Speech Recognition model fine-tuned on CommonVoice 15.0 French set with *LeBenchmark/wav2vec2-FR-7K-large* as the pretrained wav2vec2 model. The fine-tuned model achieves the following performance : |:-------------:|:--------------:|:--------------:| :--------:| | Release | Valid WER | Test WER | GPUs | |:-------------:|:--------------:|:--------------:| :--------:| | 2023-09-08 | 9.14 | 11.21 | 4xV100 32GB | |:-------------:|:--------------:|:--------------:| :--------:| ## Model Details The ASR system is composed of: - the **Tokenizer** (char) that transforms the input text into a sequence of characters ("cat" into ["c", "a", "t"]) and trained with the train transcriptions (train.tsv). - the **Acoustic model** (wav2vec2.0 + DNN + CTC greedy decode). The pretrained wav2vec 2.0 model (LeBenchmark/wav2vec2-FR-7K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large) is combined with two DNN layers and fine-tuned on CommonVoice FR. The final acoustic representation is given to the CTC greedy decode. We used recordings sampled at 16kHz (single channel). - **Developed by:** Cécile Macaire - **Funded by [optional]:** GENCI-IDRIS (Grant 2023-AD011013625R1) PROPICTO ANR-20-CE93-0005 - **Language(s) (NLP):** French - **License:** Apache-2.0 - **Finetuned from model:** LeBenchmark/wav2vec2-FR-7K-large ## How to Get Started with the Model ## Training Details ### Training Data ### Training Procedure #### Preprocessing [optional] [More Information Needed] #### Training Hyperparameters - **Training regime:** [More Information Needed] #### Speeds, Sizes, Times [optional] [More Information Needed] ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data [More Information Needed] #### Metrics [More Information Needed] ### Results [More Information Needed] #### Summary [More Information Needed] ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** [More Information Needed] - **Hours used:** [More Information Needed] - **Cloud Provider:** [More Information Needed] - **Compute Region:** [More Information Needed] - **Carbon Emitted:** [More Information Needed] ## Technical Specifications [optional] ### Model Architecture and Objective [More Information Needed] ### Compute Infrastructure [More Information Needed] #### Hardware [More Information Needed] #### Software [More Information Needed] ## Citation ```bibtex @inproceedings{macaire24_interspeech, title = {Towards Speech-to-Pictograms Translation}, author = {Cécile Macaire and Chloé Dion and Didier Schwab and Benjamin Lecouteux and Emmanuelle Esperança-Rodier}, year = {2024}, booktitle = {Interspeech 2024}, pages = {857--861}, doi = {10.21437/Interspeech.2024-490}, issn = {2958-1796}, } ```