cs_mT5-large2_2e-5_50_v0.1
This model is a fine-tuned version of google/mt5-large on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 4.5108
- Bleu: 19.8919
- Gen Len: 17.7619
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 50
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
---|---|---|---|---|---|
16.9199 | 1.0 | 6 | 10.5138 | 9.6354 | 19.0 |
9.9396 | 2.0 | 12 | 8.3590 | 8.988 | 19.0 |
19.1783 | 3.0 | 18 | 7.4137 | 8.7723 | 19.0 |
9.8097 | 4.0 | 24 | 7.3182 | 8.8796 | 19.0 |
16.8467 | 5.0 | 30 | 7.2232 | 8.6892 | 19.0 |
9.745 | 6.0 | 36 | 6.9902 | 7.822 | 19.0 |
6.2948 | 7.0 | 42 | 6.8174 | 8.2013 | 19.0 |
6.3194 | 8.0 | 48 | 6.7064 | 7.6678 | 19.0 |
6.927 | 9.0 | 54 | 6.6122 | 9.9162 | 19.0 |
7.198 | 10.0 | 60 | 6.5138 | 13.3863 | 19.0 |
7.6505 | 11.0 | 66 | 6.4263 | 12.4078 | 19.0 |
7.9063 | 12.0 | 72 | 6.3326 | 13.0376 | 19.0 |
9.021 | 13.0 | 78 | 6.2376 | 13.6209 | 19.0 |
9.2462 | 14.0 | 84 | 6.1222 | 13.3871 | 19.0 |
7.7924 | 15.0 | 90 | 5.9968 | 14.1604 | 19.0 |
5.1947 | 16.0 | 96 | 5.8706 | 11.7859 | 19.0 |
9.9564 | 17.0 | 102 | 5.7396 | 13.4904 | 19.0 |
5.2706 | 18.0 | 108 | 5.6295 | 13.5218 | 19.0 |
6.6567 | 19.0 | 114 | 5.5203 | 14.0857 | 19.0 |
5.0918 | 20.0 | 120 | 5.3965 | 15.3213 | 19.0 |
6.2442 | 21.0 | 126 | 5.2742 | 15.6508 | 19.0 |
4.5073 | 22.0 | 132 | 5.1884 | 15.8637 | 19.0 |
3.3254 | 23.0 | 138 | 5.1282 | 14.7385 | 19.0 |
6.9905 | 24.0 | 144 | 5.0841 | 15.5385 | 19.0 |
6.3553 | 25.0 | 150 | 5.0408 | 16.9058 | 19.0 |
4.8396 | 26.0 | 156 | 5.0165 | 16.3831 | 19.0 |
4.7646 | 27.0 | 162 | 4.9914 | 16.2156 | 19.0 |
3.6864 | 28.0 | 168 | 4.9643 | 16.4319 | 19.0 |
4.7526 | 29.0 | 174 | 4.9186 | 17.5044 | 19.0 |
4.5518 | 30.0 | 180 | 4.8727 | 16.7818 | 19.0 |
3.9017 | 31.0 | 186 | 4.8264 | 16.9433 | 19.0 |
4.6864 | 32.0 | 192 | 4.7818 | 16.8868 | 19.0 |
3.0676 | 33.0 | 198 | 4.7505 | 18.2291 | 19.0 |
5.9861 | 34.0 | 204 | 4.7214 | 18.3309 | 19.0 |
5.0304 | 35.0 | 210 | 4.7003 | 18.3309 | 19.0 |
3.9478 | 36.0 | 216 | 4.6791 | 18.1004 | 19.0 |
4.9706 | 37.0 | 222 | 4.6651 | 17.787 | 19.0 |
5.0404 | 38.0 | 228 | 4.6401 | 17.787 | 19.0 |
4.938 | 39.0 | 234 | 4.6045 | 18.6261 | 17.7619 |
5.7176 | 40.0 | 240 | 4.5833 | 17.1931 | 17.7619 |
3.3352 | 41.0 | 246 | 4.5654 | 17.1931 | 17.7619 |
4.8397 | 42.0 | 252 | 4.5517 | 17.6767 | 17.7619 |
4.401 | 43.0 | 258 | 4.5441 | 17.1931 | 17.7619 |
5.4609 | 44.0 | 264 | 4.5370 | 17.5969 | 17.7619 |
4.9223 | 45.0 | 270 | 4.5295 | 19.1503 | 17.7619 |
4.092 | 46.0 | 276 | 4.5215 | 19.1133 | 17.7619 |
3.3364 | 47.0 | 282 | 4.5159 | 19.1133 | 17.7619 |
4.9208 | 48.0 | 288 | 4.5131 | 19.8919 | 17.7619 |
3.5934 | 49.0 | 294 | 4.5115 | 19.8919 | 17.7619 |
4.5551 | 50.0 | 300 | 4.5108 | 19.8919 | 17.7619 |
Framework versions
- Transformers 4.38.2
- Pytorch 2.1.0+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 2
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for kmok1/cs_mT5-large2_2e-5_50_v0.1
Base model
google/mt5-large