slplab commited on
Commit
8927a13
·
verified ·
1 Parent(s): caa37fe

Create README.md

Browse files

# Wav2Vec2-Large-Robust ETRI Korean-English Pronunciation Model

This repository contains a fine-tuned Wav2Vec2-Large-Robust model for phoneme recognition tasks. The model was trained and evaluated on the ETRI English pronunciation dataset for Korean learners.

## Data Information
- **Dataset Name**: ETRI English Pronunciation of Korean Learners
- **Train Data**: 14,305 samples
- **ValidData**: 1,590 samples
- **Test Data**: 3,974 samples

## Training Procedure
The model was fine-tuned for phoneme recognition using the Hugging Face `transformers` library. Below are the training steps:
1. Data preprocessing to align audio with phoneme labels.
2. Wav2Vec2-Large-Robust model fine-tuning with CTC loss.
3. Train/Valid/Test with 70/10/20 of the whole data.

### Training Hyperparameters
- **Epochs**: 50
- **Learning Rate**: 0.0001
- **Warmup Ratio**: 0.1
- **Scheduler**: Linear
- **Batch Size**: 8
- **Loss Reduction**: Mean
- **Feature Extractor Freeze**: Enabled

## Training Results
The following metrics were achieved during training:
- **Final Training Loss**: 0.2527
- **Validation Loss**: 0.4532
- **Word Error Rate (WER) on Validation Set**: 0.1617

## Test Results
The model was evaluated on the test dataset with the following performance:
- **Word Error Rate (WER)**: 0.1223

## Phoneme Data Example
Here is an example of how the dataset is structured for phoneme recognition tasks:

**Sample 1:**
- **Provided Sentence**: The one with the ribbon on its head
- **Correct Korean English Phonemes**: dh ah w ah n w ih dh ax r ih b ah n ao n ih t s hh eh dd
- **Predicted Phonemes**: d ah w ah n w ih dh ah r ih b ah n ao n ih ts hh eh dd

## Training Logs
TensorBoard logs are available for detailed training analysis:
- `events.out.tfevents.1732529747.oem-WS-C621E-SAGE-Series.2265579.0`
- `events.out.tfevents.1732573537.oem-WS-C621E-SAGE-Series.2265579.1`

Use the following command to visualize logs:
```bash
tensorboard --logdir ./logs/actual_phoneme_recognition_ep50_lr0.0001_warm0.1_type-linear/

Files changed (1) hide show
  1. README.md +0 -0
README.md ADDED
File without changes