fjmgAI commited on
Commit
5feb6fb
·
verified ·
1 Parent(s): e7d3f7f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md CHANGED
@@ -10,6 +10,8 @@ language:
10
  - en
11
  datasets:
12
  - jacktol/atc-dataset
 
 
13
  ---
14
  [<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/67b2f4e49edebc815a3a4739/R1g957j1aBbx8lhZbWmxw.jpeg" width="200"/>](https://huggingface.co/fjmgAI)
15
 
@@ -33,6 +35,66 @@ This dataset contains **14,830 examples** transcriptions and corresponding audio
33
  - The model was trained using the **Seq2SeqTrainer**.
34
  - The **Word Error Rate (WER)** was employed as the loss metric to evaluate and optimize the model's performance during the fine-tuning process.
35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  ## Purpose
37
  This fine-tuned model is designed for **Speech-to-Text (STT) applications** in **Air Traffic Control (ATC)** environments, leveraging a specialized ATC dataset to enhance robustness and precision in transcribing ATC recordings. The model aims to deliver accurate and reliable transcription while maintaining efficient performance.
38
 
 
10
  - en
11
  datasets:
12
  - jacktol/atc-dataset
13
+ library_name: unsloth
14
+ pipeline_tag: automatic-speech-recognition
15
  ---
16
  [<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/67b2f4e49edebc815a3a4739/R1g957j1aBbx8lhZbWmxw.jpeg" width="200"/>](https://huggingface.co/fjmgAI)
17
 
 
35
  - The model was trained using the **Seq2SeqTrainer**.
36
  - The **Word Error Rate (WER)** was employed as the loss metric to evaluate and optimize the model's performance during the fine-tuning process.
37
 
38
+ ## Usage
39
+
40
+ ### Direct Usage (Unsloth)
41
+
42
+ First install the dependencies:
43
+
44
+ Colab Version
45
+ ```bash
46
+ %%capture
47
+
48
+ !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
49
+ !pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
50
+ !pip install transformers==4.51.3
51
+ !pip install --no-deps unsloth
52
+ !pip install librosa soundfile evaluate jiwer
53
+ ```
54
+ No Colab Version
55
+ ```bash
56
+ pip install unsloth
57
+ pip install librosa soundfile evaluate jiwer
58
+ ```
59
+ Then you can load this model and run inference.
60
+ ```python
61
+ import torch
62
+ from unsloth import FastModel
63
+ from transformers import pipeline
64
+ from transformers import WhisperForConditionalGeneration
65
+
66
+
67
+ model, tokenizer = FastModel.from_pretrained(
68
+ model_name = "fjmgAI/whisper-large-v3-ATC",
69
+ dtype = None,
70
+ load_in_4bit = False,
71
+ auto_model = WhisperForConditionalGeneration,
72
+ whisper_language = "English",
73
+ whisper_task = "transcribe",
74
+ )
75
+
76
+ model.generation_config.language = "<|en|>"
77
+ model.generation_config.task = "transcribe"
78
+ model.config.suppress_tokens = []
79
+ model.generation_config.forced_decoder_ids = None
80
+
81
+ whisper = pipeline(
82
+ "automatic-speech-recognition",
83
+ model=model,
84
+ tokenizer=tokenizer.tokenizer,
85
+ feature_extractor=tokenizer.feature_extractor,
86
+ processor=tokenizer,
87
+ return_language=True,
88
+ torch_dtype=torch.float16
89
+ )
90
+
91
+ audio_file = "audio_example.flac"
92
+
93
+ transcribed_text = whisper(audio_file)
94
+
95
+ print(transcribed_text["text"])
96
+ ```
97
+
98
  ## Purpose
99
  This fine-tuned model is designed for **Speech-to-Text (STT) applications** in **Air Traffic Control (ATC)** environments, leveraging a specialized ATC dataset to enhance robustness and precision in transcribing ATC recordings. The model aims to deliver accurate and reliable transcription while maintaining efficient performance.
100