Update README.md
Browse files
README.md
CHANGED
|
@@ -10,6 +10,8 @@ language:
|
|
| 10 |
- en
|
| 11 |
datasets:
|
| 12 |
- jacktol/atc-dataset
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
[<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/67b2f4e49edebc815a3a4739/R1g957j1aBbx8lhZbWmxw.jpeg" width="200"/>](https://huggingface.co/fjmgAI)
|
| 15 |
|
|
@@ -33,6 +35,66 @@ This dataset contains **14,830 examples** transcriptions and corresponding audio
|
|
| 33 |
- The model was trained using the **Seq2SeqTrainer**.
|
| 34 |
- The **Word Error Rate (WER)** was employed as the loss metric to evaluate and optimize the model's performance during the fine-tuning process.
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
## Purpose
|
| 37 |
This fine-tuned model is designed for **Speech-to-Text (STT) applications** in **Air Traffic Control (ATC)** environments, leveraging a specialized ATC dataset to enhance robustness and precision in transcribing ATC recordings. The model aims to deliver accurate and reliable transcription while maintaining efficient performance.
|
| 38 |
|
|
|
|
| 10 |
- en
|
| 11 |
datasets:
|
| 12 |
- jacktol/atc-dataset
|
| 13 |
+
library_name: unsloth
|
| 14 |
+
pipeline_tag: automatic-speech-recognition
|
| 15 |
---
|
| 16 |
[<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/67b2f4e49edebc815a3a4739/R1g957j1aBbx8lhZbWmxw.jpeg" width="200"/>](https://huggingface.co/fjmgAI)
|
| 17 |
|
|
|
|
| 35 |
- The model was trained using the **Seq2SeqTrainer**.
|
| 36 |
- The **Word Error Rate (WER)** was employed as the loss metric to evaluate and optimize the model's performance during the fine-tuning process.
|
| 37 |
|
| 38 |
+
## Usage
|
| 39 |
+
|
| 40 |
+
### Direct Usage (Unsloth)
|
| 41 |
+
|
| 42 |
+
First install the dependencies:
|
| 43 |
+
|
| 44 |
+
Colab Version
|
| 45 |
+
```bash
|
| 46 |
+
%%capture
|
| 47 |
+
|
| 48 |
+
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
|
| 49 |
+
!pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
|
| 50 |
+
!pip install transformers==4.51.3
|
| 51 |
+
!pip install --no-deps unsloth
|
| 52 |
+
!pip install librosa soundfile evaluate jiwer
|
| 53 |
+
```
|
| 54 |
+
No Colab Version
|
| 55 |
+
```bash
|
| 56 |
+
pip install unsloth
|
| 57 |
+
pip install librosa soundfile evaluate jiwer
|
| 58 |
+
```
|
| 59 |
+
Then you can load this model and run inference.
|
| 60 |
+
```python
|
| 61 |
+
import torch
|
| 62 |
+
from unsloth import FastModel
|
| 63 |
+
from transformers import pipeline
|
| 64 |
+
from transformers import WhisperForConditionalGeneration
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
model, tokenizer = FastModel.from_pretrained(
|
| 68 |
+
model_name = "fjmgAI/whisper-large-v3-ATC",
|
| 69 |
+
dtype = None,
|
| 70 |
+
load_in_4bit = False,
|
| 71 |
+
auto_model = WhisperForConditionalGeneration,
|
| 72 |
+
whisper_language = "English",
|
| 73 |
+
whisper_task = "transcribe",
|
| 74 |
+
)
|
| 75 |
+
|
| 76 |
+
model.generation_config.language = "<|en|>"
|
| 77 |
+
model.generation_config.task = "transcribe"
|
| 78 |
+
model.config.suppress_tokens = []
|
| 79 |
+
model.generation_config.forced_decoder_ids = None
|
| 80 |
+
|
| 81 |
+
whisper = pipeline(
|
| 82 |
+
"automatic-speech-recognition",
|
| 83 |
+
model=model,
|
| 84 |
+
tokenizer=tokenizer.tokenizer,
|
| 85 |
+
feature_extractor=tokenizer.feature_extractor,
|
| 86 |
+
processor=tokenizer,
|
| 87 |
+
return_language=True,
|
| 88 |
+
torch_dtype=torch.float16
|
| 89 |
+
)
|
| 90 |
+
|
| 91 |
+
audio_file = "audio_example.flac"
|
| 92 |
+
|
| 93 |
+
transcribed_text = whisper(audio_file)
|
| 94 |
+
|
| 95 |
+
print(transcribed_text["text"])
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
## Purpose
|
| 99 |
This fine-tuned model is designed for **Speech-to-Text (STT) applications** in **Air Traffic Control (ATC)** environments, leveraging a specialized ATC dataset to enhance robustness and precision in transcribing ATC recordings. The model aims to deliver accurate and reliable transcription while maintaining efficient performance.
|
| 100 |
|