ZeeshanGeoPk
/

haitian-speech-to-text

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

haitian-speech-to-text / README.md

ZeeshanGeoPk's picture

Update README.md

df1e4ce verified about 1 year ago

|

history blame contribute delete

1.72 kB

	---
	license: apache-2.0
	language:
	- ht
	metrics:
	- wer
	library_name: transformers
	---
	# Haitian Speech-to-Text Model

	## Overview
	This repository contains a fine-tuned Whisper ASR (Automatic Speech Recognition) model for the Haitian language. The model is hosted on Hugging Face and is ready for use.

	## Performance
	The model achieved a Word Error Rate (WER) of 0.19126, indicating high accuracy in transcribing spoken Haitian to written text.

	## Training
	The model was trained with a learning rate of 1e-5.

	## Usage
	You can use this model directly from the Hugging Face Model Hub. Here's a simple example in Python:

	```
	from transformers import WhisperProcessor, WhisperForConditionalGeneration
	import torchaudio

	# load model and processor
	processor = WhisperProcessor.from_pretrained("ZeeshanGeoPk/haitian-speech-to-text")
	model = WhisperForConditionalGeneration.from_pretrained("ZeeshanGeoPk/haitian-speech-to-text")

	# read audio files
	sample_path = "path/to/audio.wav"
	# load audio file using torchaudio
	waveform, sample_rate = torchaudio.load(sample_path)

	# resample if needed (Whisper model requires 16kHz)
	if sample_rate != 16000:
	resampler = torchaudio.transforms.Resample(sample_rate, 16000)
	waveform = resampler(waveform)
	sample_rate = 16000

	# ensure mono channel
	if waveform.shape[0] > 1:
	waveform = waveform.mean(dim=0, keepdim=True)

	# process audio using Whisper processor
	input_features = processor(waveform.numpy(), sampling_rate=sample_rate, return_tensors="pt").input_features

	# generate token ids
	predicted_ids = model.generate(input_features)
	# decode token ids to text
	transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
	print(transcription)

	```