metadata

license: apache-2.0
language:
  - vi
tags:
  - automatic-speech-recognition
  - robust-speech-event
  - common-voice
model-index:
  - name: xls-asr-vi-40h-1B
    results:
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 7.0 vi
          type: common_voice
          args: vi
        metrics:
          - name: Test WER
            type: wer
            value: 34.21
          - name: Test CER
            type: cer
            value: 19.94

xls-asr-vi-40h-1B

This model is a fine-tuned version of facebook/wav2vec2-xls-r-1b on the common voice 7.0 vi & private dataset.

Benchmark WER result:

	VIVOS	COMMON VOICE 7.0 VI
without LM	25.92	34.21

Benchmark CER result:

	VIVOS	COMMON VOICE 7.0 VI
without LM	9.24	19.94

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1500
num_epochs: 50.0
mixed_precision_training: Native AMP
attention_dropout: 0.2
activation_dropout: 0.1
warmup_steps: 1500
mask_time_prob: .15
mask_time_length: 10
mask_feature_prob: 0.25
mask_feature_length: 64

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
4.6222	1.85	1500	5.9479	0.5474
1.1362	3.7	3000	7.9799	0.5094
0.7814	5.56	4500	5.0330	0.4724
0.6281	7.41	6000	2.3484	0.5020
0.5472	9.26	7500	2.2495	0.4793
0.4827	11.11	9000	1.1530	0.4768
0.4327	12.96	10500	1.6160	0.4646
0.3989	14.81	12000	3.2633	0.4703
0.3522	16.67	13500	2.2337	0.4708
0.3201	18.52	15000	3.6879	0.4565
0.2899	20.37	16500	5.4389	0.4599
0.2776	22.22	18000	3.5284	0.4537
0.2574	24.07	19500	2.1759	0.4649
0.2378	25.93	21000	3.3901	0.4448
0.217	27.78	22500	1.1632	0.4565
0.2115	29.63	24000	1.7441	0.4232
0.1959	31.48	25500	3.4992	0.4304
0.187	33.33	27000	3.6163	0.4369
0.1748	35.19	28500	3.6038	0.4467
0.17	37.04	30000	2.9708	0.4362
0.159	38.89	31500	3.2045	0.4279
0.153	40.74	33000	3.2427	0.4287
0.1463	42.59	34500	3.5439	0.4270
0.139	44.44	36000	3.9381	0.4150
0.1352	46.3	37500	4.1744	0.4092
0.1369	48.15	39000	4.2279	0.4154
0.1273	50.0	40500	4.1691	0.4133

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.1+cu102
Datasets 1.17.1.dev0
Tokenizers 0.11.0