w2v-bert-malayalam

This model is a fine-tuned version of facebook/w2v-bert-2.0 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 10
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss	Wer
0.3705	0.2758	2000	0.3227	0.3629
0.291	0.5516	4000	0.2434	0.2891
0.2695	0.8274	6000	0.2445	0.2775
0.2118	1.1032	8000	0.1979	0.2567
0.1923	1.3790	10000	0.1852	0.2213
0.1788	1.6548	12000	0.1691	0.2033
0.167	1.9306	14000	0.1870	0.1955
0.1612	2.2063	16000	0.1571	0.1731
0.1516	2.4821	18000	0.1406	0.1685
0.1597	2.7579	20000	0.1358	0.1496
0.1299	3.0336	22000	0.1332	0.1397
0.1096	3.3095	24000	0.1397	0.1384
0.1291	3.5853	26000	0.1298	0.1354
0.0975	3.8611	28000	0.1220	0.1134
0.0919	4.1368	30000	0.1261	0.1081
0.0806	4.4126	32000	0.1189	0.1120
0.0778	4.6884	34000	0.1159	0.1027
0.0922	4.9642	36000	0.1218	0.1027
0.0907	5.2400	38000	0.1099	0.0977
0.0708	5.5158	40000	0.1043	0.0920
0.0715	5.7916	42000	0.1048	0.0928
0.0646	6.0673	44000	0.1047	0.0893
0.0567	6.3431	46000	0.1294	0.0891
0.0729	6.6189	48000	0.1236	0.0873
0.0607	6.8947	50000	0.1182	0.0830
0.0555	7.1705	52000	0.1222	0.0809
0.0516	7.4463	54000	0.1145	0.0798
0.0429	7.7221	56000	0.0915	0.0763
0.0399	7.9979	58000	0.0987	0.0731
0.0373	8.2736	60000	0.1167	0.0714
0.0371	8.5494	62000	0.1130	0.0710
0.0412	8.8252	64000	0.1194	0.0707
0.0282	9.1009	66000	0.1217	0.0683
0.0284	9.3768	68000	0.1177	0.0671
0.0275	9.6526	70000	0.1117	0.0661
0.0216	9.9284	72000	0.1149	0.0646