wav2vec2-large-960h / README.md
ilokavat's picture
Update README.md
6b57c48 verified
metadata
library_name: transformers
license: apache-2.0
base_model: facebook/wav2vec2-large-960h
tags:
  - generated_from_trainer
metrics:
  - wer
model-index:
  - name: wav2vec2-large-960h
    results: []

wav2vec2-large-960h

This model is a fine-tuned version of facebook/wav2vec2-large-960h on an acc_dataset_v2" dataset (commit: a41c520).

It achieves the following results on the evaluation set:

  • Loss: 0.6286

  • Wer: 0.1538

Model description

This is a voice-2-text transcription model specialized for the acc dataset.

Training and evaluation data

Training was based on the training set in acc_dataset_v2 and evaluation based on the validation and test sets in the same dataset.

Training procedure

See the Jupyter notebook Finetuning-notebook-wav2vec2-large-960h-on-acc-data for the full training procedure.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 64
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 15
  • num_epochs: 128
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
1.2379 8.3333 50 0.6501 0.3056
0.4865 16.6667 100 0.7069 0.2790
0.3054 25.0 150 0.6598 0.2369
0.2308 33.3333 200 0.6517 0.2215
0.1793 41.6667 250 0.6884 0.2103
0.1379 50.0 300 0.6418 0.1949
0.1253 58.3333 350 0.7004 0.1918
0.0988 66.6667 400 0.6059 0.1846
0.088 75.0 450 0.6507 0.1826
0.0773 83.3333 500 0.5473 0.1682
0.0686 91.6667 550 0.6027 0.1682
0.0643 100.0 600 0.6192 0.1713
0.0595 108.3333 650 0.6119 0.1703
0.0562 116.6667 700 0.5953 0.16
0.0507 125.0 750 0.6286 0.1538

Framework versions

  • Transformers 4.48.2
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0