mt5-small-finetuned
This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:
- Loss: 2.0192
- Rouge1: 0.3780
- Rouge2: 0.1970
- Rougel: 0.3508
- Rougelsum: 0.3527
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5.6e-05
- train_batch_size: 12
- eval_batch_size: 12
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 60
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum |
---|---|---|---|---|---|---|---|
21.4055 | 1.0 | 12 | 13.7151 | 0.0189 | 0.0084 | 0.0147 | 0.0189 |
17.1792 | 2.0 | 24 | 11.5227 | 0.0189 | 0.0084 | 0.0147 | 0.0189 |
15.0485 | 3.0 | 36 | 9.5193 | 0.0 | 0.0 | 0.0 | 0.0 |
13.0405 | 4.0 | 48 | 6.8529 | 0.0102 | 0.0 | 0.0102 | 0.0102 |
11.7418 | 5.0 | 60 | 5.8151 | 0.0331 | 0.0084 | 0.0303 | 0.0335 |
9.659 | 6.0 | 72 | 5.6024 | 0.0344 | 0.0084 | 0.0344 | 0.0357 |
8.6025 | 7.0 | 84 | 4.7311 | 0.0273 | 0.0036 | 0.0269 | 0.0277 |
7.5747 | 8.0 | 96 | 3.8319 | 0.0510 | 0.0031 | 0.0483 | 0.0456 |
6.916 | 9.0 | 108 | 3.5873 | 0.0578 | 0.0 | 0.0540 | 0.0520 |
6.3394 | 10.0 | 120 | 3.4854 | 0.0788 | 0.0076 | 0.0792 | 0.0794 |
5.5822 | 11.0 | 132 | 3.2956 | 0.0752 | 0.0158 | 0.0694 | 0.0697 |
5.0731 | 12.0 | 144 | 3.0977 | 0.0524 | 0.0115 | 0.0470 | 0.0475 |
4.7234 | 13.0 | 156 | 2.9120 | 0.0331 | 0.0105 | 0.0279 | 0.0285 |
4.3512 | 14.0 | 168 | 2.7709 | 0.0527 | 0.0304 | 0.0375 | 0.0377 |
4.136 | 15.0 | 180 | 2.6770 | 0.0616 | 0.0331 | 0.0494 | 0.0495 |
3.8591 | 16.0 | 192 | 2.5894 | 0.1028 | 0.0473 | 0.0817 | 0.0826 |
3.6558 | 17.0 | 204 | 2.5183 | 0.1814 | 0.0828 | 0.1541 | 0.1532 |
3.4821 | 18.0 | 216 | 2.4590 | 0.1940 | 0.0838 | 0.1618 | 0.1621 |
3.3248 | 19.0 | 228 | 2.3901 | 0.2062 | 0.0856 | 0.1667 | 0.1676 |
3.194 | 20.0 | 240 | 2.3352 | 0.1971 | 0.0918 | 0.1672 | 0.1684 |
3.0883 | 21.0 | 252 | 2.2934 | 0.1971 | 0.0918 | 0.1672 | 0.1684 |
2.9907 | 22.0 | 264 | 2.2471 | 0.2039 | 0.0943 | 0.1660 | 0.1675 |
2.9249 | 23.0 | 276 | 2.2038 | 0.1904 | 0.0843 | 0.1515 | 0.1537 |
2.8418 | 24.0 | 288 | 2.1643 | 0.1995 | 0.0939 | 0.1686 | 0.1705 |
2.7667 | 25.0 | 300 | 2.1296 | 0.2233 | 0.1002 | 0.1882 | 0.1890 |
2.7157 | 26.0 | 312 | 2.1176 | 0.3513 | 0.1825 | 0.3422 | 0.3432 |
2.7058 | 27.0 | 324 | 2.0969 | 0.3525 | 0.1803 | 0.3444 | 0.3457 |
2.5703 | 28.0 | 336 | 2.0761 | 0.3507 | 0.1847 | 0.3395 | 0.3409 |
2.4907 | 29.0 | 348 | 2.0688 | 0.3379 | 0.1741 | 0.3281 | 0.3290 |
2.3974 | 30.0 | 360 | 2.0706 | 0.3520 | 0.1872 | 0.3391 | 0.3402 |
2.4584 | 31.0 | 372 | 2.0635 | 0.3465 | 0.1840 | 0.3332 | 0.3344 |
2.3775 | 32.0 | 384 | 2.0560 | 0.3525 | 0.1826 | 0.3390 | 0.3411 |
2.4014 | 33.0 | 396 | 2.0544 | 0.3585 | 0.1860 | 0.3456 | 0.3469 |
2.3388 | 34.0 | 408 | 2.0583 | 0.3607 | 0.1865 | 0.3483 | 0.3496 |
2.3288 | 35.0 | 420 | 2.0487 | 0.3551 | 0.1835 | 0.3368 | 0.3379 |
2.3233 | 36.0 | 432 | 2.0394 | 0.3569 | 0.1803 | 0.3313 | 0.3326 |
2.2882 | 37.0 | 444 | 2.0361 | 0.3585 | 0.1867 | 0.3422 | 0.3446 |
2.2109 | 38.0 | 456 | 2.0324 | 0.3565 | 0.1858 | 0.3413 | 0.3429 |
2.212 | 39.0 | 468 | 2.0327 | 0.3585 | 0.1867 | 0.3422 | 0.3446 |
2.2059 | 40.0 | 480 | 2.0310 | 0.3612 | 0.1849 | 0.3421 | 0.3436 |
2.1866 | 41.0 | 492 | 2.0352 | 0.3612 | 0.1849 | 0.3421 | 0.3436 |
2.2122 | 42.0 | 504 | 2.0369 | 0.3612 | 0.1849 | 0.3421 | 0.3436 |
2.1305 | 43.0 | 516 | 2.0351 | 0.3604 | 0.1863 | 0.3419 | 0.3443 |
2.1174 | 44.0 | 528 | 2.0358 | 0.3578 | 0.1864 | 0.3397 | 0.3413 |
2.0972 | 45.0 | 540 | 2.0356 | 0.3602 | 0.1881 | 0.3390 | 0.3405 |
2.1051 | 46.0 | 552 | 2.0325 | 0.3606 | 0.1861 | 0.3359 | 0.3376 |
2.0632 | 47.0 | 564 | 2.0329 | 0.3606 | 0.1861 | 0.3359 | 0.3376 |
2.0601 | 48.0 | 576 | 2.0301 | 0.3621 | 0.1857 | 0.3346 | 0.3364 |
2.0487 | 49.0 | 588 | 2.0301 | 0.3621 | 0.1857 | 0.3346 | 0.3364 |
2.0538 | 50.0 | 600 | 2.0314 | 0.3617 | 0.1876 | 0.3380 | 0.3399 |
2.071 | 51.0 | 612 | 2.0308 | 0.3608 | 0.1871 | 0.3368 | 0.3385 |
2.0415 | 52.0 | 624 | 2.0283 | 0.3777 | 0.1993 | 0.3546 | 0.3559 |
2.007 | 53.0 | 636 | 2.0259 | 0.3777 | 0.1993 | 0.3546 | 0.3559 |
2.0238 | 54.0 | 648 | 2.0232 | 0.3777 | 0.1993 | 0.3546 | 0.3559 |
2.074 | 55.0 | 660 | 2.0207 | 0.3780 | 0.1970 | 0.3508 | 0.3527 |
2.0497 | 56.0 | 672 | 2.0202 | 0.3780 | 0.1970 | 0.3508 | 0.3527 |
2.0075 | 57.0 | 684 | 2.0200 | 0.3780 | 0.1970 | 0.3508 | 0.3527 |
2.0837 | 58.0 | 696 | 2.0193 | 0.3780 | 0.1970 | 0.3508 | 0.3527 |
2.0277 | 59.0 | 708 | 2.0194 | 0.3780 | 0.1970 | 0.3508 | 0.3527 |
2.0912 | 60.0 | 720 | 2.0192 | 0.3780 | 0.1970 | 0.3508 | 0.3527 |
Framework versions
- Transformers 4.47.1
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 10
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for ak2603/mt5-small-finetuned
Base model
google/mt5-small