Kiyoshi20 LucasTramonte commited on
Commit
309b041
·
verified ·
1 Parent(s): bacd3bd

Update README.md (#2)

Browse files

- Update README.md (78304c7672625788e976a8b3ed76064dac3ea508)


Co-authored-by: Lucas Tramonte <[email protected]>

Files changed (1) hide show
  1. README.md +55 -1
README.md CHANGED
@@ -11,4 +11,58 @@ tags:
11
  - Transformers
12
  - speech
13
  - audio
14
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  - Transformers
12
  - speech
13
  - audio
14
+ ---
15
+
16
+ # Model Description
17
+
18
+ This model is a fine-tuned version of facebook/wav2vec2-base-960h for automatic speech recognition (ASR).
19
+ It has been trained using the [LibriSpeech dataset](https://paperswithcode.com/dataset/librispeech) and is designed to improve transcription accuracy over the base model.
20
+
21
+ The fine-tuning process involved:
22
+
23
+ - Selecting a subset of speakers from the `dev-clean` and `test-clean` datasets.
24
+ - Preprocessing audio files and their corresponding transcriptions.
25
+ - Training with gradient accumulation, mixed precision (if available), and periodic evaluation.
26
+ - Saving the fine-tuned model for inference.
27
+
28
+ *[GitHub](https://github.com/LucasTramonte/SpeechRecognition)*
29
+ *Authors*: Lucas Tramonte, Kiyoshi Araki
30
+
31
+ # Usage
32
+
33
+ To transcribe audio files, the model can be used as follows:
34
+
35
+ ```python
36
+ from transformers import AutoProcessor, AutoModelForCTC
37
+ import torch
38
+ import librosa
39
+
40
+ # Load model and processor
41
+ processor = AutoProcessor.from_pretrained("deepl-project/conformer-finetunning")
42
+ model = AutoModelForCTC.from_pretrained("deepl-project/conformer-finetunning")
43
+
44
+ # Load and preprocess an audio file
45
+ file_path = "path/to/audio/file.wav"
46
+ speech, sr = librosa.load(file_path, sr=16000)
47
+ inputs = processor(speech, sampling_rate=sr, return_tensors="pt", padding=True)
48
+
49
+ # Perform inference
50
+ with torch.no_grad():
51
+ logits = model(**inputs).logits
52
+
53
+ # Decode transcription
54
+ predicted_ids = torch.argmax(logits, dim=-1)
55
+ transcription = processor.batch_decode(predicted_ids)
56
+
57
+ print("Transcription:", transcription[0])
58
+ ```
59
+
60
+
61
+ # References
62
+
63
+ - [LibriSpeech Dataset](https://paperswithcode.com/dataset/librispeech)
64
+ - [Conformer Model Paper](https://paperswithcode.com/paper/conformer-based-target-speaker-automatic)
65
+ - [Whisper Model Paper](https://arxiv.org/abs/2212.04356)
66
+
67
+
68
+