slplab
/

wav2vec2-large-robust_ETRI_Korean_english-pronunciation

Automatic Speech Recognition

speech-recognition

english-phoneme-recognition

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

slplab commited on Dec 7, 2024

Commit

e259073

·

verified ·

1 Parent(s): 8927a13

Create README.md

Files changed (1) hide show

README.md +52 -0

README.md CHANGED Viewed

	@@ -0,0 +1,52 @@

+# Wav2Vec2-Large-Robust ETRI Korean-English Pronunciation Model
+This repository contains a fine-tuned Wav2Vec2-Large-Robust model for phoneme recognition tasks. The model was trained and evaluated on the ETRI English pronunciation dataset for Korean learners.
+## Data Information
+- **Dataset Name**: ETRI English Pronunciation of Korean Learners
+- **Train Data**: 14,305 samples
+- **Valid Data**: 1,590 samples
+- **Test Data**: 3,974 samples
+## Training Procedure
+The model was fine-tuned for phoneme recognition using the Hugging Face `transformers` library. Below are the training steps:
+1. Data preprocessing to align audio with phoneme labels.
+2. Wav2Vec2-Large-Robust model fine-tuning with CTC loss.
+3. Evaluation on validation and test sets.
+### Training Hyperparameters
+- **Epochs**: 50
+- **Learning Rate**: 0.0001
+- **Warmup Ratio**: 0.1
+- **Scheduler**: Linear
+- **Batch Size**: 8
+- **Loss Reduction**: Mean
+- **Feature Extractor Freeze**: Enabled
+## Training Results
+The following metrics were achieved during training:
+- **Final Training Loss**: 0.2527
+- **Validation Loss**: 0.4532
+- **Word Error Rate (WER) on Validation Set**: 0.1617
+## Test Results
+The model was evaluated on the test dataset with the following performance:
+- **Word Error Rate (WER)**: 0.1223
+## Phoneme Data Example
+Below is an example of how the dataset is structured for phoneme recognition tasks:
+**Sample 1:**
+- **Provided Sentence**: The one with the ribbon on its head
+- **Correct Korean English Phonemes**: dh ah w ah n w ih dh ax r ih b ah n ao n ih t s hh eh dd
+- **Predicted Phonemes**: d ah w ah n w ih dh ah r ih b ah n ao n ih ts hh eh dd
+## Training Logs
+TensorBoard logs are available for detailed training analysis:
+- `events.out.tfevents.1732529747.oem-WS-C621E-SAGE-Series.2265579.0`
+- `events.out.tfevents.1732573537.oem-WS-C621E-SAGE-Series.2265579.1`
+Use the following command to visualize logs:
+```bash
+tensorboard --logdir ./logs/actual_phoneme_recognition_ep50_lr0.0001_warm0.1_type-linear/