jsbeaudry
/

oswald-large-v3-turbo-m1

@@ -5,29 +5,35 @@ tags:
 - transformers
 - unsloth
 - whisper
 license: apache-2.0
 language:
 - ht
 ---
-# whisper-medium-creole-oswald
-This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the **creole-text-voice** dataset.
 The main objective is to create a **99% accurate Haitian Creole Speech-to-Text model**, capable of transcribing diverse Haitian voices across accents, regions, and speaking styles.
 ---
 ## 🧠 Model description
-**whisper-medium-creole-oswald** is optimized for Haitian Creole automatic speech recognition (ASR). It builds upon the Whisper architecture by OpenAI and adapts it to Haitian Creole through transfer learning and fine-tuning on a high-quality curated dataset containing hours of Haitian Creole audio-text pairs.
-- **Architecture**: Whisper Medium
 - **Fine-tuned for**: Haitian Creole (Kreyòl Ayisyen)
 - **Vocabulary**: Based on Latin script (Creole orthography), preserving diacritics and linguistic nuances.
-- **Voice types**: Made with female synthetics voices.
 - **Sampling rate**: 16kHz
 - **Training objective**: Maximize transcription accuracy for everyday Creole speech
@@ -51,7 +57,6 @@ The main objective is to create a **99% accurate Haitian Creole Speech-to-Text m
 ### ⚠️ Limitations
 - May struggle with:
-  - Heavily code-switched speech (Creole + French/English mixed)
   - Extremely poor audio quality (e.g., heavy background noise)
   - Very fast or mumbled speech in some dialects
   - Long duration audio file
@@ -64,7 +69,8 @@ The main objective is to create a **99% accurate Haitian Creole Speech-to-Text m
 The model was trained on the **creole-text-voice** dataset, which includes:
-- **5 hours** of Haitian Creole Synthetic speech
 - Annotated, time-aligned text transcripts following standard Creole orthography
 ### Sources for next steps:
@@ -87,8 +93,8 @@ import librosa
 import numpy as np
 import torch
-processor = AutoProcessor.from_pretrained("jsbeaudry/whisper-medium-oswald")
-model = AutoModelForSpeechSeq2Seq.from_pretrained("jsbeaudry/whisper-medium-oswald")
 def transcript (audio_file_path):
@@ -124,7 +130,7 @@ import gradio as gr
 # Load Whisper model
 print("Loading model...")
-pipe = pipeline(model="jsbeaudry/whisper-medium-oswald")
 print("Model loaded successfully.")
 # Transcription function
@@ -146,14 +152,12 @@ def create_interface():
         with gr.Row():
             with gr.Column():
                 audio_input = gr.Audio(source="upload", type="filepath", label="🎧 Upload Audio")
-                audio_input2 = gr.Audio(source="microphone", type="filepath", label="🎤 Record Audio")
             with gr.Column():
                 transcribe_button = gr.Button("🔍 Transcribe")
                 output_text = gr.Textbox(label="📝 Transcribed Text", lines=4)
         transcribe_button.click(fn=transcribe, inputs=audio_input, outputs=output_text)
-        transcribe_button.click(fn=transcribe, inputs=audio_input2, outputs=output_text)
     return demo
@@ -165,15 +169,22 @@ if __name__ == "__main__":
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 16
-- eval_batch_size: 8
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 500
-- num_epochs: 5
-- mixed_precision_training: Native AMP
 ### Framework versions
@@ -191,8 +202,8 @@ If you use this model, please cite:
 ```bibtex
 @misc{whispermediumcreoleoswald2025,
-  title={Whisper Medium Creole - Oswald},
   author={Jean sauvenel beaudry},
   year={2025},
   howpublished={\url{https://huggingface.co/jsbeaudry}}
-}

 - transformers
 - unsloth
 - whisper
+- creole
+- haiti
 license: apache-2.0
 language:
 - ht
+datasets:
+- jsbeaudry/cmu_haitian_creole_speech
+- jsbeaudry/creole-text-voice
+pipeline_tag: automatic-speech-recognition
 ---
+# oswald-large-v3-turbo-m1
+This model is a fine-tuned version of [openai/unsloth/whisper-large-v3-turbo](https://huggingface.co/unsloth/whisper-large-v3-turbo) on the **creole-text-voice** dataset.
 The main objective is to create a **99% accurate Haitian Creole Speech-to-Text model**, capable of transcribing diverse Haitian voices across accents, regions, and speaking styles.
 ---
 ## 🧠 Model description
+**oswald-large-v3-turbo-m1** is optimized for Haitian Creole automatic speech recognition (ASR). It builds upon the Whisper architecture by OpenAI and adapts it to Haitian Creole through transfer learning and fine-tuning on a high-quality curated dataset containing hours of Haitian Creole audio-text pairs.
+- **Architecture**: Whisper Large
 - **Fine-tuned for**: Haitian Creole (Kreyòl Ayisyen)
 - **Vocabulary**: Based on Latin script (Creole orthography), preserving diacritics and linguistic nuances.
+- **Voice types**: Made with female and male synthetics and naturals voices.
 - **Sampling rate**: 16kHz
 - **Training objective**: Maximize transcription accuracy for everyday Creole speech
 ### ⚠️ Limitations
 - May struggle with:
   - Extremely poor audio quality (e.g., heavy background noise)
   - Very fast or mumbled speech in some dialects
   - Long duration audio file
 The model was trained on the **creole-text-voice** dataset, which includes:
+- **7 hours** of Haitian Creole Synthetic speech
+- **8 hours** of Haitian Creole Human speech
 - Annotated, time-aligned text transcripts following standard Creole orthography
 ### Sources for next steps:
 import numpy as np
 import torch
+processor = AutoProcessor.from_pretrained("jsbeaudry/oswald-large-v3-turbo-m1")
+model = AutoModelForSpeechSeq2Seq.from_pretrained("jsbeaudry/oswald-large-v3-turbo-m1")
 def transcript (audio_file_path):
 # Load Whisper model
 print("Loading model...")
+pipe = pipeline(model="jsbeaudry/oswald-large-v3-turbo-m1")
 print("Model loaded successfully.")
 # Transcription function
         with gr.Row():
             with gr.Column():
                 audio_input = gr.Audio(source="upload", type="filepath", label="🎧 Upload Audio")
             with gr.Column():
                 transcribe_button = gr.Button("🔍 Transcribe")
                 output_text = gr.Textbox(label="📝 Transcribed Text", lines=4)
         transcribe_button.click(fn=transcribe, inputs=audio_input, outputs=output_text)
     return demo
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 1e-4
+- num_epochs: 6.65
+- hours: 2:52
+Step	Training Loss	Validation Loss
+100	0.565400	0.656878
+200	0.481000	0.528320
+300	0.457000	0.460658
+400	0.822300	0.419748
+500	0.298300	0.397042
+.....
+8300	0.049500	0.215643
+8400	0.024700	0.210167
 ### Framework versions
 ```bibtex
 @misc{whispermediumcreoleoswald2025,
+  title={oswald large  turbo M1},
   author={Jean sauvenel beaudry},
   year={2025},
   howpublished={\url{https://huggingface.co/jsbeaudry}}
+}