Create README.md
Browse files
README.md
CHANGED
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
# Wav2Vec2-Large-Robust ETRI Korean-English Pronunciation Model
|
3 |
+
|
4 |
+
This repository contains a fine-tuned Wav2Vec2-Large-Robust model for phoneme recognition tasks. The model was trained and evaluated on the ETRI English pronunciation dataset for Korean learners.
|
5 |
+
|
6 |
+
## Data Information
|
7 |
+
- **Dataset Name**: ETRI English Pronunciation of Korean Learners
|
8 |
+
- **Train Data**: 14,305 samples
|
9 |
+
- **Valid Data**: 1,590 samples
|
10 |
+
- **Test Data**: 3,974 samples
|
11 |
+
|
12 |
+
## Training Procedure
|
13 |
+
The model was fine-tuned for phoneme recognition using the Hugging Face `transformers` library. Below are the training steps:
|
14 |
+
1. Data preprocessing to align audio with phoneme labels.
|
15 |
+
2. Wav2Vec2-Large-Robust model fine-tuning with CTC loss.
|
16 |
+
3. Evaluation on validation and test sets.
|
17 |
+
|
18 |
+
### Training Hyperparameters
|
19 |
+
- **Epochs**: 50
|
20 |
+
- **Learning Rate**: 0.0001
|
21 |
+
- **Warmup Ratio**: 0.1
|
22 |
+
- **Scheduler**: Linear
|
23 |
+
- **Batch Size**: 8
|
24 |
+
- **Loss Reduction**: Mean
|
25 |
+
- **Feature Extractor Freeze**: Enabled
|
26 |
+
|
27 |
+
## Training Results
|
28 |
+
The following metrics were achieved during training:
|
29 |
+
- **Final Training Loss**: 0.2527
|
30 |
+
- **Validation Loss**: 0.4532
|
31 |
+
- **Word Error Rate (WER) on Validation Set**: 0.1617
|
32 |
+
|
33 |
+
## Test Results
|
34 |
+
The model was evaluated on the test dataset with the following performance:
|
35 |
+
- **Word Error Rate (WER)**: 0.1223
|
36 |
+
|
37 |
+
## Phoneme Data Example
|
38 |
+
Below is an example of how the dataset is structured for phoneme recognition tasks:
|
39 |
+
|
40 |
+
**Sample 1:**
|
41 |
+
- **Provided Sentence**: The one with the ribbon on its head
|
42 |
+
- **Correct Korean English Phonemes**: dh ah w ah n w ih dh ax r ih b ah n ao n ih t s hh eh dd
|
43 |
+
- **Predicted Phonemes**: d ah w ah n w ih dh ah r ih b ah n ao n ih ts hh eh dd
|
44 |
+
|
45 |
+
## Training Logs
|
46 |
+
TensorBoard logs are available for detailed training analysis:
|
47 |
+
- `events.out.tfevents.1732529747.oem-WS-C621E-SAGE-Series.2265579.0`
|
48 |
+
- `events.out.tfevents.1732573537.oem-WS-C621E-SAGE-Series.2265579.1`
|
49 |
+
|
50 |
+
Use the following command to visualize logs:
|
51 |
+
```bash
|
52 |
+
tensorboard --logdir ./logs/actual_phoneme_recognition_ep50_lr0.0001_warm0.1_type-linear/
|