xls-r-300m-mocho-120

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 30
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Cer
6.3894	1.5822	400	2.9912	0.9993
1.8056	3.1624	800	0.8697	0.2591
1.057	4.7446	1200	0.6070	0.1843
0.8359	6.3248	1600	0.4518	0.1386
0.6848	7.9069	2000	0.3494	0.1075
0.5899	9.4871	2400	0.2883	0.0898
0.5033	11.0673	2800	0.2550	0.0768
0.4508	12.6495	3200	0.2317	0.0714
0.39	14.2297	3600	0.2030	0.0614
0.3583	15.8119	4000	0.1736	0.0577
0.3038	17.3921	4400	0.1573	0.0475
0.2891	18.9743	4800	0.1310	0.0448
0.2488	20.5545	5200	0.1233	0.0387
0.2254	22.1347	5600	0.1062	0.0327
0.1936	23.7168	6000	0.0811	0.0305
0.1638	25.2970	6400	0.0641	0.0254
0.1489	26.8792	6800	0.0499	0.0211
0.1361	28.4594	7200	0.0414	0.0204