na_voice_clon

This model is a fine-tuned version of microsoft/speecht5_tts on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 8
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 2000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.9913	1.1905	50	0.8766
0.9763	2.3810	100	0.8198
0.9234	3.5714	150	0.7816
0.8756	4.7619	200	0.7637
0.8449	5.9524	250	0.7322
0.7719	7.1429	300	0.6737
0.7254	8.3333	350	0.6366
0.6845	9.5238	400	0.6195
0.6737	10.7143	450	0.6131
0.6689	11.9048	500	0.6178
0.6595	13.0952	550	0.6039
0.655	14.2857	600	0.6046
0.6514	15.4762	650	0.5944
0.6478	16.6667	700	0.5940
0.6327	17.8571	750	0.5939
0.6418	19.0476	800	0.5938
0.6329	20.2381	850	0.5887
0.6364	21.4286	900	0.5906
0.6284	22.6190	950	0.5887
0.6238	23.8095	1000	0.5865
0.624	25.0	1050	0.5859
0.6163	26.1905	1100	0.5845
0.6254	27.3810	1150	0.5840
0.6168	28.5714	1200	0.5831
0.6141	29.7619	1250	0.5791
0.614	30.9524	1300	0.5835
0.6121	32.1429	1350	0.5788
0.6227	33.3333	1400	0.5785
0.6198	34.5238	1450	0.5775
0.6142	35.7143	1500	0.5803
0.6183	36.9048	1550	0.5765
0.6161	38.0952	1600	0.5781
0.6061	39.2857	1650	0.5768
0.6167	40.4762	1700	0.5773
0.6063	41.6667	1750	0.5775
0.6107	42.8571	1800	0.5777
0.6084	44.0476	1850	0.5772
0.6074	45.2381	1900	0.5757
0.6023	46.4286	1950	0.5766
0.6077	47.6190	2000	0.5806