jsbeaudry commited on
Commit
47d2552
·
verified ·
1 Parent(s): f8c1887

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -23
README.md CHANGED
@@ -5,29 +5,35 @@ tags:
5
  - transformers
6
  - unsloth
7
  - whisper
 
 
8
  license: apache-2.0
9
  language:
10
  - ht
 
 
 
 
11
  ---
12
 
13
 
14
 
15
 
16
- # whisper-medium-creole-oswald
17
 
18
- This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the **creole-text-voice** dataset.
19
  The main objective is to create a **99% accurate Haitian Creole Speech-to-Text model**, capable of transcribing diverse Haitian voices across accents, regions, and speaking styles.
20
 
21
  ---
22
 
23
  ## 🧠 Model description
24
 
25
- **whisper-medium-creole-oswald** is optimized for Haitian Creole automatic speech recognition (ASR). It builds upon the Whisper architecture by OpenAI and adapts it to Haitian Creole through transfer learning and fine-tuning on a high-quality curated dataset containing hours of Haitian Creole audio-text pairs.
26
 
27
- - **Architecture**: Whisper Medium
28
  - **Fine-tuned for**: Haitian Creole (Kreyòl Ayisyen)
29
  - **Vocabulary**: Based on Latin script (Creole orthography), preserving diacritics and linguistic nuances.
30
- - **Voice types**: Made with female synthetics voices.
31
  - **Sampling rate**: 16kHz
32
  - **Training objective**: Maximize transcription accuracy for everyday Creole speech
33
 
@@ -51,7 +57,6 @@ The main objective is to create a **99% accurate Haitian Creole Speech-to-Text m
51
 
52
  ### ⚠️ Limitations
53
  - May struggle with:
54
- - Heavily code-switched speech (Creole + French/English mixed)
55
  - Extremely poor audio quality (e.g., heavy background noise)
56
  - Very fast or mumbled speech in some dialects
57
  - Long duration audio file
@@ -64,7 +69,8 @@ The main objective is to create a **99% accurate Haitian Creole Speech-to-Text m
64
 
65
  The model was trained on the **creole-text-voice** dataset, which includes:
66
 
67
- - **5 hours** of Haitian Creole Synthetic speech
 
68
  - Annotated, time-aligned text transcripts following standard Creole orthography
69
 
70
  ### Sources for next steps:
@@ -87,8 +93,8 @@ import librosa
87
  import numpy as np
88
  import torch
89
 
90
- processor = AutoProcessor.from_pretrained("jsbeaudry/whisper-medium-oswald")
91
- model = AutoModelForSpeechSeq2Seq.from_pretrained("jsbeaudry/whisper-medium-oswald")
92
 
93
  def transcript (audio_file_path):
94
 
@@ -124,7 +130,7 @@ import gradio as gr
124
 
125
  # Load Whisper model
126
  print("Loading model...")
127
- pipe = pipeline(model="jsbeaudry/whisper-medium-oswald")
128
  print("Model loaded successfully.")
129
 
130
  # Transcription function
@@ -146,14 +152,12 @@ def create_interface():
146
  with gr.Row():
147
  with gr.Column():
148
  audio_input = gr.Audio(source="upload", type="filepath", label="🎧 Upload Audio")
149
- audio_input2 = gr.Audio(source="microphone", type="filepath", label="🎤 Record Audio")
150
  with gr.Column():
151
  transcribe_button = gr.Button("🔍 Transcribe")
152
  output_text = gr.Textbox(label="📝 Transcribed Text", lines=4)
153
 
154
 
155
  transcribe_button.click(fn=transcribe, inputs=audio_input, outputs=output_text)
156
- transcribe_button.click(fn=transcribe, inputs=audio_input2, outputs=output_text)
157
 
158
  return demo
159
 
@@ -165,15 +169,22 @@ if __name__ == "__main__":
165
  ### Training hyperparameters
166
 
167
  The following hyperparameters were used during training:
168
- - learning_rate: 1e-05
169
- - train_batch_size: 16
170
- - eval_batch_size: 8
171
- - seed: 42
172
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
173
- - lr_scheduler_type: linear
174
- - lr_scheduler_warmup_steps: 500
175
- - num_epochs: 5
176
- - mixed_precision_training: Native AMP
 
 
 
 
 
 
 
177
 
178
 
179
  ### Framework versions
@@ -191,8 +202,8 @@ If you use this model, please cite:
191
 
192
  ```bibtex
193
  @misc{whispermediumcreoleoswald2025,
194
- title={Whisper Medium Creole - Oswald},
195
  author={Jean sauvenel beaudry},
196
  year={2025},
197
  howpublished={\url{https://huggingface.co/jsbeaudry}}
198
- }
 
5
  - transformers
6
  - unsloth
7
  - whisper
8
+ - creole
9
+ - haiti
10
  license: apache-2.0
11
  language:
12
  - ht
13
+ datasets:
14
+ - jsbeaudry/cmu_haitian_creole_speech
15
+ - jsbeaudry/creole-text-voice
16
+ pipeline_tag: automatic-speech-recognition
17
  ---
18
 
19
 
20
 
21
 
22
+ # oswald-large-v3-turbo-m1
23
 
24
+ This model is a fine-tuned version of [openai/unsloth/whisper-large-v3-turbo](https://huggingface.co/unsloth/whisper-large-v3-turbo) on the **creole-text-voice** dataset.
25
  The main objective is to create a **99% accurate Haitian Creole Speech-to-Text model**, capable of transcribing diverse Haitian voices across accents, regions, and speaking styles.
26
 
27
  ---
28
 
29
  ## 🧠 Model description
30
 
31
+ **oswald-large-v3-turbo-m1** is optimized for Haitian Creole automatic speech recognition (ASR). It builds upon the Whisper architecture by OpenAI and adapts it to Haitian Creole through transfer learning and fine-tuning on a high-quality curated dataset containing hours of Haitian Creole audio-text pairs.
32
 
33
+ - **Architecture**: Whisper Large
34
  - **Fine-tuned for**: Haitian Creole (Kreyòl Ayisyen)
35
  - **Vocabulary**: Based on Latin script (Creole orthography), preserving diacritics and linguistic nuances.
36
+ - **Voice types**: Made with female and male synthetics and naturals voices.
37
  - **Sampling rate**: 16kHz
38
  - **Training objective**: Maximize transcription accuracy for everyday Creole speech
39
 
 
57
 
58
  ### ⚠️ Limitations
59
  - May struggle with:
 
60
  - Extremely poor audio quality (e.g., heavy background noise)
61
  - Very fast or mumbled speech in some dialects
62
  - Long duration audio file
 
69
 
70
  The model was trained on the **creole-text-voice** dataset, which includes:
71
 
72
+ - **7 hours** of Haitian Creole Synthetic speech
73
+ - **8 hours** of Haitian Creole Human speech
74
  - Annotated, time-aligned text transcripts following standard Creole orthography
75
 
76
  ### Sources for next steps:
 
93
  import numpy as np
94
  import torch
95
 
96
+ processor = AutoProcessor.from_pretrained("jsbeaudry/oswald-large-v3-turbo-m1")
97
+ model = AutoModelForSpeechSeq2Seq.from_pretrained("jsbeaudry/oswald-large-v3-turbo-m1")
98
 
99
  def transcript (audio_file_path):
100
 
 
130
 
131
  # Load Whisper model
132
  print("Loading model...")
133
+ pipe = pipeline(model="jsbeaudry/oswald-large-v3-turbo-m1")
134
  print("Model loaded successfully.")
135
 
136
  # Transcription function
 
152
  with gr.Row():
153
  with gr.Column():
154
  audio_input = gr.Audio(source="upload", type="filepath", label="🎧 Upload Audio")
 
155
  with gr.Column():
156
  transcribe_button = gr.Button("🔍 Transcribe")
157
  output_text = gr.Textbox(label="📝 Transcribed Text", lines=4)
158
 
159
 
160
  transcribe_button.click(fn=transcribe, inputs=audio_input, outputs=output_text)
 
161
 
162
  return demo
163
 
 
169
  ### Training hyperparameters
170
 
171
  The following hyperparameters were used during training:
172
+ - learning_rate: 1e-4
173
+ - num_epochs: 6.65
174
+ - hours: 2:52
175
+
176
+
177
+ Step Training Loss Validation Loss
178
+ 100 0.565400 0.656878
179
+ 200 0.481000 0.528320
180
+ 300 0.457000 0.460658
181
+ 400 0.822300 0.419748
182
+ 500 0.298300 0.397042
183
+ .....
184
+ 8300 0.049500 0.215643
185
+ 8400 0.024700 0.210167
186
+
187
+
188
 
189
 
190
  ### Framework versions
 
202
 
203
  ```bibtex
204
  @misc{whispermediumcreoleoswald2025,
205
+ title={oswald large turbo M1},
206
  author={Jean sauvenel beaudry},
207
  year={2025},
208
  howpublished={\url{https://huggingface.co/jsbeaudry}}
209
+ }