wav2vec2-large-xlsr-53-sw-tokenizer

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the common_voice_17_0 dataset. It achieves the following results on the evaluation set:

Loss: 0.4306
Wer: 0.3240

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
No log	0.1721	1000	0.7966	0.7685
No log	0.3441	2000	0.5178	0.5562
2.0511	0.5162	3000	0.4524	0.5039
2.0511	0.6882	4000	0.4207	0.4615
2.0511	0.8603	5000	0.4031	0.4437
0.2699	1.0323	6000	0.3875	0.4224
0.2699	1.2044	7000	0.3870	0.4141
0.2699	1.3765	8000	0.3811	0.4143
0.1994	1.5485	9000	0.3689	0.4026
0.1994	1.7206	10000	0.3603	0.3915
0.1994	1.8926	11000	0.3561	0.3862
0.1838	2.0647	12000	0.3502	0.3809
0.1838	2.2368	13000	0.3580	0.3763
0.1838	2.4088	14000	0.3445	0.3747
0.1472	2.5809	15000	0.3416	0.3720
0.1472	2.7529	16000	0.3599	0.3709
0.1472	2.9250	17000	0.3503	0.3666
0.1405	3.0970	18000	0.3549	0.3624
0.1405	3.2691	19000	0.3476	0.3582
0.1405	3.4412	20000	0.3359	0.3574
0.116	3.6132	21000	0.3487	0.3600
0.116	3.7853	22000	0.3439	0.3552
0.116	3.9573	23000	0.3502	0.3579
0.1103	4.1294	24000	0.3436	0.3513
0.1103	4.3014	25000	0.3502	0.3502
0.1103	4.4735	26000	0.3381	0.3534
0.0957	4.6456	27000	0.3411	0.3482
0.0957	4.8176	28000	0.3425	0.3456
0.0957	4.9897	29000	0.3331	0.3425
0.0883	5.1617	30000	0.3620	0.3449
0.0883	5.3338	31000	0.3403	0.3430
0.0883	5.5058	32000	0.3590	0.3429
0.0757	5.6779	33000	0.3474	0.3402
0.0757	5.8500	34000	0.3395	0.3378
0.0757	6.0220	35000	0.3565	0.3395
0.0695	6.1941	36000	0.3729	0.3397
0.0695	6.3661	37000	0.3676	0.3368
0.0695	6.5382	38000	0.3748	0.3364
0.0601	6.7103	39000	0.3783	0.3360
0.0601	6.8823	40000	0.3657	0.3363
0.0601	7.0544	41000	0.3808	0.3343
0.0542	7.2264	42000	0.3934	0.3361
0.0542	7.3985	43000	0.3787	0.3369
0.0542	7.5705	44000	0.3920	0.3310
0.0487	7.7426	45000	0.3906	0.3321
0.0487	7.9147	46000	0.3934	0.3323
0.0487	8.0867	47000	0.4060	0.3305
0.0412	8.2588	48000	0.4145	0.3301
0.0412	8.4308	49000	0.4125	0.3282
0.0412	8.6029	50000	0.4111	0.3286
0.0381	8.7749	51000	0.4113	0.3265
0.0381	8.9470	52000	0.4147	0.3268
0.0381	9.1191	53000	0.4221	0.3271
0.0338	9.2911	54000	0.4299	0.3268
0.0338	9.4632	55000	0.4221	0.3250
0.0338	9.6352	56000	0.4314	0.3245
0.0318	9.8073	57000	0.4307	0.3243
0.0318	9.9794	58000	0.4306	0.3240

Framework versions

Transformers 4.55.4
Pytorch 2.8.0+cu126
Datasets 3.6.0
Tokenizers 0.21.4

Downloads last month: 1

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for dennohpeter/wav2vec2-large-xlsr-53-sw-tokenizer

Base model

facebook/wav2vec2-large-xlsr-53

Finetuned

(324)

this model

Evaluation results

Wer on common_voice_17_0
test set self-reported

0.324

View on Papers With Code