mt5-small-synthetic-data2

This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.8493
Rouge1: 0.5864
Rouge2: 0.4472
Rougel: 0.5690
Rougelsum: 0.5675

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5.6e-05
train_batch_size: 12
eval_batch_size: 12
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 40

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum
18.931	1.0	50	9.0942	0.0	0.0	0.0	0.0
11.3652	2.0	100	5.2268	0.0025	0.0014	0.0025	0.0025
7.6242	3.0	150	3.0583	0.0755	0.0247	0.0712	0.0697
5.0038	4.0	200	2.1559	0.1720	0.0608	0.1415	0.1411
3.4385	5.0	250	1.6094	0.2058	0.0858	0.1798	0.1801
2.7359	6.0	300	1.4043	0.3742	0.2353	0.3549	0.3574
2.2687	7.0	350	1.2929	0.4226	0.2639	0.3944	0.3962
2.0252	8.0	400	1.2258	0.4436	0.2820	0.4129	0.4159
1.8135	9.0	450	1.1667	0.4529	0.2932	0.4160	0.4176
1.7448	10.0	500	1.1103	0.4729	0.3152	0.4391	0.4409
1.5793	11.0	550	1.0840	0.5045	0.3557	0.4774	0.4787
1.5258	12.0	600	1.0532	0.5266	0.3857	0.5053	0.5061
1.4391	13.0	650	1.0176	0.5507	0.4182	0.5381	0.5367
1.3783	14.0	700	1.0015	0.5595	0.4233	0.5387	0.5386
1.318	15.0	750	0.9825	0.5699	0.4260	0.5476	0.5468
1.2871	16.0	800	0.9581	0.5785	0.4334	0.5564	0.5554
1.2305	17.0	850	0.9489	0.5766	0.4343	0.5540	0.5538
1.2609	18.0	900	0.9362	0.5853	0.4432	0.5633	0.5633
1.1928	19.0	950	0.9256	0.5847	0.4438	0.5637	0.5635
1.1165	20.0	1000	0.9186	0.5712	0.4331	0.5535	0.5535
1.1624	21.0	1050	0.9080	0.5763	0.4434	0.5581	0.5586
1.0909	22.0	1100	0.9040	0.5774	0.4417	0.5596	0.5604
1.0885	23.0	1150	0.8969	0.5827	0.4465	0.5642	0.5646
1.1378	24.0	1200	0.8933	0.5855	0.4476	0.5668	0.5663
0.9968	25.0	1250	0.8832	0.5851	0.4467	0.5664	0.5659
1.0871	26.0	1300	0.8776	0.5848	0.4460	0.5661	0.5659
1.0546	27.0	1350	0.8749	0.5825	0.4443	0.5635	0.5630
0.9935	28.0	1400	0.8687	0.5842	0.4467	0.5682	0.5678
1.0042	29.0	1450	0.8661	0.5834	0.4466	0.5669	0.5666
0.9903	30.0	1500	0.8628	0.5843	0.4485	0.5655	0.5653
0.9701	31.0	1550	0.8583	0.5822	0.4436	0.5630	0.5629
0.9585	32.0	1600	0.8552	0.5783	0.4405	0.5610	0.5605
0.9412	33.0	1650	0.8555	0.5897	0.4492	0.5696	0.5687
0.9732	34.0	1700	0.8526	0.5853	0.4477	0.5661	0.5655
0.9248	35.0	1750	0.8535	0.5828	0.4429	0.5646	0.5637
0.9408	36.0	1800	0.8520	0.5868	0.4474	0.5695	0.5680
0.9951	37.0	1850	0.8506	0.5834	0.4456	0.5656	0.5645
0.9316	38.0	1900	0.8500	0.5846	0.4470	0.5667	0.5657
0.9339	39.0	1950	0.8495	0.5864	0.4472	0.5690	0.5675
0.9519	40.0	2000	0.8493	0.5864	0.4472	0.5690	0.5675

Framework versions

Transformers 4.47.1
Pytorch 2.5.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0

ak2603
/

mt5-small-synthetic-data2

mt5-small-synthetic-data2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for ak2603/mt5-small-synthetic-data2

Evaluation results