slplab commited on
Commit
e259073
·
verified ·
1 Parent(s): 8927a13

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md CHANGED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Wav2Vec2-Large-Robust ETRI Korean-English Pronunciation Model
3
+
4
+ This repository contains a fine-tuned Wav2Vec2-Large-Robust model for phoneme recognition tasks. The model was trained and evaluated on the ETRI English pronunciation dataset for Korean learners.
5
+
6
+ ## Data Information
7
+ - **Dataset Name**: ETRI English Pronunciation of Korean Learners
8
+ - **Train Data**: 14,305 samples
9
+ - **Valid Data**: 1,590 samples
10
+ - **Test Data**: 3,974 samples
11
+
12
+ ## Training Procedure
13
+ The model was fine-tuned for phoneme recognition using the Hugging Face `transformers` library. Below are the training steps:
14
+ 1. Data preprocessing to align audio with phoneme labels.
15
+ 2. Wav2Vec2-Large-Robust model fine-tuning with CTC loss.
16
+ 3. Evaluation on validation and test sets.
17
+
18
+ ### Training Hyperparameters
19
+ - **Epochs**: 50
20
+ - **Learning Rate**: 0.0001
21
+ - **Warmup Ratio**: 0.1
22
+ - **Scheduler**: Linear
23
+ - **Batch Size**: 8
24
+ - **Loss Reduction**: Mean
25
+ - **Feature Extractor Freeze**: Enabled
26
+
27
+ ## Training Results
28
+ The following metrics were achieved during training:
29
+ - **Final Training Loss**: 0.2527
30
+ - **Validation Loss**: 0.4532
31
+ - **Word Error Rate (WER) on Validation Set**: 0.1617
32
+
33
+ ## Test Results
34
+ The model was evaluated on the test dataset with the following performance:
35
+ - **Word Error Rate (WER)**: 0.1223
36
+
37
+ ## Phoneme Data Example
38
+ Below is an example of how the dataset is structured for phoneme recognition tasks:
39
+
40
+ **Sample 1:**
41
+ - **Provided Sentence**: The one with the ribbon on its head
42
+ - **Correct Korean English Phonemes**: dh ah w ah n w ih dh ax r ih b ah n ao n ih t s hh eh dd
43
+ - **Predicted Phonemes**: d ah w ah n w ih dh ah r ih b ah n ao n ih ts hh eh dd
44
+
45
+ ## Training Logs
46
+ TensorBoard logs are available for detailed training analysis:
47
+ - `events.out.tfevents.1732529747.oem-WS-C621E-SAGE-Series.2265579.0`
48
+ - `events.out.tfevents.1732573537.oem-WS-C621E-SAGE-Series.2265579.1`
49
+
50
+ Use the following command to visualize logs:
51
+ ```bash
52
+ tensorboard --logdir ./logs/actual_phoneme_recognition_ep50_lr0.0001_warm0.1_type-linear/