---
language: en
datasets:
- common_voice
- mozilla-foundation/common_voice_6_0
metrics:
- wer
- cer
tags:
- audio
- automatic-speech-recognition
- en
- hf-asr-leaderboard
- mozilla-foundation/common_voice_6_0
- robust-speech-event
- speech
- xlsr-fine-tuning-week
license: apache-2.0
model-index:
- name: XLSR Wav2Vec2 English
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Common Voice en
      type: common_voice
      args: en
    metrics:
    - name: Test WER
      type: wer
      value: 19.06
    - name: Test CER
      type: cer
      value: 7.69
    - name: Test WER (+LM)
      type: wer
      value: 14.81
    - name: Test CER (+LM)
      type: cer
      value: 6.84
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Robust Speech Event - Dev Data
      type: speech-recognition-community-v2/dev_data
      args: en
    metrics:
    - name: Dev WER
      type: wer
      value: 27.72
    - name: Dev CER
      type: cer
      value: 11.65
    - name: Dev WER (+LM)
      type: wer
      value: 20.85
    - name: Dev CER (+LM)
      type: cer
      value: 11.01
library_name: transformers
---

# Fine-tuned XLSR-53 large model for speech recognition in English

Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on English using the train and validation splits of [Common Voice 6.1](https://huggingface.co/datasets/common_voice).
When using this model, make sure that your speech input is sampled at 16kHz.