ITG
/

wav2vec2-large-xlsr-gl

 ---
 license: cc-by-nc-nd-4.0
+datasets:
+- openslr
+language:
+- gl
+pipeline_tag: automatic-speech-recognition
+tags:
+- ITG
+- PyTorch
+- Transformers
+- wav2vec2
 ---
+# Wav2Vec2 Large XLSR Galician
+## Description
+This is a fine-tuned version of the [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) pre-trained model for ASR in galician.
+---
+## Dataset
+The dataset used for fine-tuning this model was the [OpenSLR galician](https://huggingface.co/datasets/openslr/viewer/SLR77) dataset, available in the openslr repository.
+---
+## Example inference script
+### Check this example script to run our model in inference mode
+```python
+import torch
+from transformers import AutoProcessor, AutoModelForCTC
+filename = "demo.wav"  #change this line to the name of your audio file
+sample_rate = 16_000
+processor = AutoProcessor.from_pretrained('ITG/wav2vec2-large-xlsr-gl')
+model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/wav2vec2-large-xlsr-gl')
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+model.to(device)
+speech_array, _ = librosa.load(filename, sr=sample_rate)
+inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt", padding=True).to(device)
+with torch.no_grad():
+  logits = model(inputs.input_values, attention_mask=inputs.attention_mask.to(device)).logits
+decode_output = processor.batch_decode(torch.argmax(logits, dim=-1))[0]
+print(f"ASR Galician wav2vec2-large-xlsr output: {decode_output}")
+```
+---
+## Fine-tuning hyper-parameters
+|            **Hyper-parameter**           |          **Value**          |
+|:----------------------------------------:|:---------------------------:|
+|            Training batch size           |             16              |
+|           Evaluation batch size          |             8               |
+|               Learning rate              |             3e-4            |
+|         Gradient accumulation steps      |             2               |
+|             Group by length              |             true            |
+|            Evaluation strategy           |             steps           |
+|            Max training epochs           |             50              |
+|                Max steps                 |             4000            |
+|            Generate max length           |             225             |
+|                  FP16                    |             true            |
+|          Metric for best model           |             wer             |
+|            Greater is better             |             false           |
+## Fine-tuning in a different dataset or style
+If you're interested in fine-tuning your own wav2vec2 model, we suggest starting with the [facebook/wav2vec2-large-xlsr-53 model](https://huggingface.co/facebook/wav2vec2-large-xlsr-53). Additionally,
+you may find this [fine-tuning on galician notebook by Diego Fustes](https://github.com/diego-fustes/xlsr-fine-tuning-gl/blob/main/Fine_Tune_XLSR_Wav2Vec2_on_Galician.ipynb) to be a valuable resource.
+This guide served as a helpful reference during the training process of this Galician wav2vec2-large-xlsr model!