speech-to-text-model2

This model is a fine-tuned version of facebook/hubert-large-ls960-ft on the fleurs dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 4
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Wer
0.9469	0.1921	100	0.7808	0.1610
0.7241	0.3842	200	0.5663	0.1597
0.5805	0.5764	300	0.4338	0.1494
0.4717	0.7685	400	0.3221	0.1443
0.3769	0.9606	500	0.2380	0.1488
0.3659	1.1518	600	0.2276	0.1408
0.3316	1.3439	700	0.2139	0.1369
0.2798	1.5360	800	0.2151	0.1308
0.3207	1.7281	900	0.2075	0.1284
0.3199	1.9203	1000	0.2008	0.1265
0.2722	2.1114	1100	0.2009	0.1263
0.271	2.3036	1200	0.2077	0.1238
0.251	2.4957	1300	0.2121	0.1237
0.2918	2.6878	1400	0.1939	0.1224
0.2686	2.8799	1500	0.1992	0.1221
0.2668	3.0711	1600	0.1974	0.1226
0.2287	3.2632	1700	0.2060	0.1201
0.2546	3.4553	1800	0.1979	0.1200
0.2705	3.6475	1900	0.1938	0.1220
0.2647	3.8396	2000	0.1971	0.1205