Model Description

This model is a fine-tuned version of facebook/wav2vec2-base-960h for automatic speech recognition (ASR). It has been trained using the LibriSpeech dataset and is designed to improve transcription accuracy over the base model.

The fine-tuning process involved:

  • Selecting a subset of speakers from the dev-clean and test-clean datasets.
  • Preprocessing audio files and their corresponding transcriptions.
  • Training with gradient accumulation, mixed precision (if available), and periodic evaluation.
  • Saving the fine-tuned model for inference.

GitHub Authors: Lucas Tramonte, Kiyoshi Araki

Usage

To transcribe audio files, the model can be used as follows:

from transformers import AutoProcessor, AutoModelForCTC
import torch
import librosa

# Load model and processor
processor = AutoProcessor.from_pretrained("deepl-project/conformer-finetunning")
model = AutoModelForCTC.from_pretrained("deepl-project/conformer-finetunning")

# Load and preprocess an audio file
file_path = "path/to/audio/file.wav"
speech, sr = librosa.load(file_path, sr=16000)
inputs = processor(speech, sampling_rate=sr, return_tensors="pt", padding=True)

# Perform inference
with torch.no_grad():
    logits = model(**inputs).logits

# Decode transcription
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)

print("Transcription:", transcription[0])

References

Downloads last month
92
Safetensors
Model size
593M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for deepl-project/conformer-finetunning

Finetuned
(130)
this model

Space using deepl-project/conformer-finetunning 1