|
--- |
|
license: apache-2.0 |
|
language: |
|
- ht |
|
metrics: |
|
- wer |
|
library_name: transformers |
|
--- |
|
# Haitian Speech-to-Text Model |
|
|
|
## Overview |
|
This repository contains a fine-tuned Whisper ASR (Automatic Speech Recognition) model for the Haitian language. The model is hosted on Hugging Face and is ready for use. |
|
|
|
## Performance |
|
The model achieved a Word Error Rate (WER) of 0.19126, indicating high accuracy in transcribing spoken Haitian to written text. |
|
|
|
## Training |
|
The model was trained with a learning rate of 1e-5. |
|
|
|
## Usage |
|
You can use this model directly from the Hugging Face Model Hub. Here's a simple example in Python: |
|
|
|
``` |
|
from transformers import WhisperProcessor, WhisperForConditionalGeneration |
|
import torchaudio |
|
|
|
# load model and processor |
|
processor = WhisperProcessor.from_pretrained("ZeeshanGeoPk/haitian-speech-to-text") |
|
model = WhisperForConditionalGeneration.from_pretrained("ZeeshanGeoPk/haitian-speech-to-text") |
|
|
|
# read audio files |
|
sample_path = "path/to/audio.wav" |
|
# load audio file using torchaudio |
|
waveform, sample_rate = torchaudio.load(sample_path) |
|
|
|
# resample if needed (Whisper model requires 16kHz) |
|
if sample_rate != 16000: |
|
resampler = torchaudio.transforms.Resample(sample_rate, 16000) |
|
waveform = resampler(waveform) |
|
sample_rate = 16000 |
|
|
|
# ensure mono channel |
|
if waveform.shape[0] > 1: |
|
waveform = waveform.mean(dim=0, keepdim=True) |
|
|
|
# process audio using Whisper processor |
|
input_features = processor(waveform.numpy(), sampling_rate=sample_rate, return_tensors="pt").input_features |
|
|
|
# generate token ids |
|
predicted_ids = model.generate(input_features) |
|
# decode token ids to text |
|
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True) |
|
print(transcription) |
|
|
|
``` |