|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
metrics: |
|
- wer |
|
base_model: |
|
- facebook/wav2vec2-base-960h |
|
tags: |
|
- pytorch |
|
- Transformers |
|
- speech |
|
- audio |
|
--- |
|
|
|
# Model Description |
|
|
|
This model is a fine-tuned version of facebook/wav2vec2-base-960h for automatic speech recognition (ASR). |
|
It has been trained using the [LibriSpeech dataset](https://paperswithcode.com/dataset/librispeech) and is designed to improve transcription accuracy over the base model. |
|
|
|
The fine-tuning process involved: |
|
|
|
- Selecting a subset of speakers from the `dev-clean` and `test-clean` datasets. |
|
- Preprocessing audio files and their corresponding transcriptions. |
|
- Training with gradient accumulation, mixed precision (if available), and periodic evaluation. |
|
- Saving the fine-tuned model for inference. |
|
|
|
*[GitHub](https://github.com/LucasTramonte/SpeechRecognition)* |
|
*Authors*: Lucas Tramonte, Kiyoshi Araki |
|
|
|
# Usage |
|
|
|
To transcribe audio files, the model can be used as follows: |
|
|
|
```python |
|
from transformers import AutoProcessor, AutoModelForCTC |
|
import torch |
|
import librosa |
|
|
|
# Load model and processor |
|
processor = AutoProcessor.from_pretrained("deepl-project/conformer-finetunning") |
|
model = AutoModelForCTC.from_pretrained("deepl-project/conformer-finetunning") |
|
|
|
# Load and preprocess an audio file |
|
file_path = "path/to/audio/file.wav" |
|
speech, sr = librosa.load(file_path, sr=16000) |
|
inputs = processor(speech, sampling_rate=sr, return_tensors="pt", padding=True) |
|
|
|
# Perform inference |
|
with torch.no_grad(): |
|
logits = model(**inputs).logits |
|
|
|
# Decode transcription |
|
predicted_ids = torch.argmax(logits, dim=-1) |
|
transcription = processor.batch_decode(predicted_ids) |
|
|
|
print("Transcription:", transcription[0]) |
|
``` |
|
|
|
|
|
# References |
|
|
|
- [LibriSpeech Dataset](https://paperswithcode.com/dataset/librispeech) |
|
- [Conformer Model Paper](https://paperswithcode.com/paper/conformer-based-target-speaker-automatic) |
|
- [Whisper Model Paper](https://arxiv.org/abs/2212.04356) |
|
|
|
|
|
|
|
|