ZeeshanGeoPk commited on
Commit
2416c78
·
verified ·
1 Parent(s): 5f6629c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -1
README.md CHANGED
@@ -1,3 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
- ---
 
 
 
 
 
 
1
+ # Haitian Speech-to-Text Model
2
+
3
+ ## Overview
4
+ This repository contains a fine-tuned Whisper ASR (Automatic Speech Recognition) model for the Haitian language. The model is hosted on Hugging Face and is ready for use.
5
+
6
+ ## Performance
7
+ The model achieved a Word Error Rate (WER) of 0.19126, indicating high accuracy in transcribing spoken Haitian to written text.
8
+
9
+ ## Training
10
+ The model was trained with a learning rate of 1e-5.
11
+
12
+ ## Usage
13
+ You can use this model directly from the Hugging Face Model Hub. Here's a simple example in Python:
14
+
15
+ ```
16
+ from transformers import WhisperProcessor, WhisperForConditionalGeneration
17
+ import torchaudio
18
+
19
+ # load model and processor
20
+ processor = WhisperProcessor.from_pretrained("ZeeshanGeoPk/haitian-speech-to-text")
21
+ model = WhisperForConditionalGeneration.from_pretrained("ZeeshanGeoPk/haitian-speech-to-text")
22
+
23
+ # read audio files
24
+ sample_path = "path/to/audio.wav"
25
+ # load audio file using torchaudio
26
+ waveform, sample_rate = torchaudio.load(sample_path)
27
+
28
+ # resample if needed (Whisper model requires 16kHz)
29
+ if sample_rate != 16000:
30
+ resampler = torchaudio.transforms.Resample(sample_rate, 16000)
31
+ waveform = resampler(waveform)
32
+ sample_rate = 16000
33
+
34
+ # ensure mono channel
35
+ if waveform.shape[0] > 1:
36
+ waveform = waveform.mean(dim=0, keepdim=True)
37
+
38
+ # process audio using Whisper processor
39
+ input_features = processor(waveform.numpy(), sampling_rate=sample_rate, return_tensors="pt").input_features
40
+
41
+ # generate token ids
42
+ predicted_ids = model.generate(input_features)
43
+ # decode token ids to text
44
+ transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
45
+ print(transcription)
46
+
47
+ ```
48
+
49
  ---
50
  license: apache-2.0
51
+ language:
52
+ - ht
53
+ metrics:
54
+ - wer
55
+ library_name: transformers
56
+ ---