fjmgAI
/

whisper-large-v3-ATC

@@ -10,6 +10,8 @@ language:
 - en
 datasets:
 - jacktol/atc-dataset
 ---
 [<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/67b2f4e49edebc815a3a4739/R1g957j1aBbx8lhZbWmxw.jpeg" width="200"/>](https://huggingface.co/fjmgAI)
@@ -33,6 +35,66 @@ This dataset contains **14,830 examples** transcriptions and corresponding audio
 - The model was trained using the **Seq2SeqTrainer**.
 - The **Word Error Rate (WER)** was employed as the loss metric to evaluate and optimize the model's performance during the fine-tuning process.
 ## Purpose
 This fine-tuned model is designed for **Speech-to-Text (STT) applications** in **Air Traffic Control (ATC)** environments, leveraging a specialized ATC dataset to enhance robustness and precision in transcribing ATC recordings. The model aims to deliver accurate and reliable transcription while maintaining efficient performance.

 - en
 datasets:
 - jacktol/atc-dataset
+library_name: unsloth
+pipeline_tag: automatic-speech-recognition
 ---
 [<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/67b2f4e49edebc815a3a4739/R1g957j1aBbx8lhZbWmxw.jpeg" width="200"/>](https://huggingface.co/fjmgAI)
 - The model was trained using the **Seq2SeqTrainer**.
 - The **Word Error Rate (WER)** was employed as the loss metric to evaluate and optimize the model's performance during the fine-tuning process.
+## Usage
+### Direct Usage (Unsloth)
+First install the dependencies:
+Colab Version
+```bash
+%%capture
+!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
+!pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
+!pip install transformers==4.51.3
+!pip install --no-deps unsloth
+!pip install librosa soundfile evaluate jiwer
+```
+No Colab Version
+```bash
+pip install unsloth
+pip install librosa soundfile evaluate jiwer
+```
+Then you can load this model and run inference.
+```python
+import torch
+from unsloth import FastModel
+from transformers import pipeline
+from transformers import WhisperForConditionalGeneration
+model, tokenizer = FastModel.from_pretrained(
+    model_name = "fjmgAI/whisper-large-v3-ATC",
+    dtype = None,
+    load_in_4bit = False,
+    auto_model = WhisperForConditionalGeneration,
+    whisper_language = "English",
+    whisper_task = "transcribe",
+)
+model.generation_config.language = "<|en|>"
+model.generation_config.task = "transcribe"
+model.config.suppress_tokens = []
+model.generation_config.forced_decoder_ids = None
+whisper = pipeline(
+    "automatic-speech-recognition",
+    model=model,
+    tokenizer=tokenizer.tokenizer,
+    feature_extractor=tokenizer.feature_extractor,
+    processor=tokenizer,
+    return_language=True,
+    torch_dtype=torch.float16
+)
+audio_file = "audio_example.flac"
+transcribed_text = whisper(audio_file)
+print(transcribed_text["text"])
+```
 ## Purpose
 This fine-tuned model is designed for **Speech-to-Text (STT) applications** in **Air Traffic Control (ATC)** environments, leveraging a specialized ATC dataset to enhance robustness and precision in transcribing ATC recordings. The model aims to deliver accurate and reliable transcription while maintaining efficient performance.