mT5-Small (Eur-Lex-Sum Maltese)
This model is a fine-tuned version of google/mt5-small on the dennlinger/eur-lex-sum maltese dataset. It achieves the following results on the test set:
- Loss: 1.4531
- Chrf:
- Score: 51.5481
- Char Order: 6
- Word Order: 0
- Beta: 2
- Rouge:
- Rouge1: 0.5176
- Rouge2: 0.3497
- Rougel: 0.4249
- Rougelsum: 0.4247
- Gen Len: 254.8511
Intended uses & limitations
The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.
Training procedure
The model was fine-tuned using a customised script.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use adafactor and the args are: No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 200.0
- early_stopping_patience: 20
Training results
Training Loss | Epoch | Step | Validation Loss | Chrf Score | Chrf Char Order | Chrf Word Order | Chrf Beta | Rouge Rouge1 | Rouge Rouge2 | Rouge Rougel | Rouge Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|---|---|---|---|
No log | 1.0 | 30 | 2.2500 | 18.4506 | 6 | 0 | 2 | 0.1901 | 0.0868 | 0.1728 | 0.1729 | 255.0 |
No log | 2.0 | 60 | 1.9908 | 40.0789 | 6 | 0 | 2 | 0.3872 | 0.2330 | 0.3379 | 0.3376 | 255.0 |
No log | 3.0 | 90 | 1.7490 | 44.1723 | 6 | 0 | 2 | 0.4406 | 0.2759 | 0.3760 | 0.3758 | 255.0 |
No log | 4.0 | 120 | 1.7205 | 49.4429 | 6 | 0 | 2 | 0.4885 | 0.3313 | 0.4081 | 0.4079 | 255.0 |
No log | 5.0 | 150 | 1.5647 | 46.3055 | 6 | 0 | 2 | 0.4626 | 0.3068 | 0.3886 | 0.3886 | 255.0 |
No log | 6.0 | 180 | 1.5374 | 46.3856 | 6 | 0 | 2 | 0.4756 | 0.3169 | 0.3986 | 0.3989 | 254.4439 |
No log | 7.0 | 210 | 1.5262 | 47.2806 | 6 | 0 | 2 | 0.4706 | 0.3154 | 0.3959 | 0.3962 | 254.7807 |
No log | 8.0 | 240 | 1.5142 | 48.5214 | 6 | 0 | 2 | 0.4916 | 0.3255 | 0.4121 | 0.4119 | 254.8449 |
No log | 9.0 | 270 | 1.5271 | 49.4788 | 6 | 0 | 2 | 0.4982 | 0.3350 | 0.4211 | 0.4210 | 253.9893 |
No log | 10.0 | 300 | 1.4995 | 48.3063 | 6 | 0 | 2 | 0.4832 | 0.3224 | 0.4127 | 0.4126 | 254.6684 |
No log | 11.0 | 330 | 1.4947 | 52.1382 | 6 | 0 | 2 | 0.5213 | 0.3593 | 0.4416 | 0.4418 | 254.7914 |
No log | 12.0 | 360 | 1.4704 | 49.9226 | 6 | 0 | 2 | 0.5004 | 0.3363 | 0.4236 | 0.4235 | 254.6203 |
No log | 13.0 | 390 | 1.4933 | 51.6030 | 6 | 0 | 2 | 0.5199 | 0.3514 | 0.4317 | 0.4318 | 253.6257 |
No log | 14.0 | 420 | 1.4640 | 47.8714 | 6 | 0 | 2 | 0.4840 | 0.3242 | 0.4094 | 0.4091 | 254.6952 |
No log | 15.0 | 450 | 1.4726 | 51.2718 | 6 | 0 | 2 | 0.5188 | 0.3488 | 0.4354 | 0.4356 | 254.7166 |
No log | 16.0 | 480 | 1.4667 | 49.9968 | 6 | 0 | 2 | 0.4989 | 0.3400 | 0.4287 | 0.4281 | 254.6203 |
1.7931 | 17.0 | 510 | 1.4624 | 50.7874 | 6 | 0 | 2 | 0.5123 | 0.3436 | 0.4345 | 0.4345 | 254.5508 |
1.7931 | 18.0 | 540 | 1.4775 | 50.5126 | 6 | 0 | 2 | 0.5121 | 0.3448 | 0.4273 | 0.4274 | 253.4439 |
1.7931 | 19.0 | 570 | 1.4762 | 50.7875 | 6 | 0 | 2 | 0.5194 | 0.3458 | 0.4311 | 0.4315 | 252.6631 |
1.7931 | 20.0 | 600 | 1.5157 | 52.2624 | 6 | 0 | 2 | 0.5187 | 0.3446 | 0.4324 | 0.4323 | 253.8289 |
1.7931 | 21.0 | 630 | 1.4982 | 51.8279 | 6 | 0 | 2 | 0.5161 | 0.3478 | 0.4368 | 0.4369 | 254.3529 |
1.7931 | 22.0 | 660 | 1.5087 | 51.9486 | 6 | 0 | 2 | 0.5174 | 0.3438 | 0.4315 | 0.4310 | 254.7807 |
1.7931 | 23.0 | 690 | 1.5355 | 51.9191 | 6 | 0 | 2 | 0.5224 | 0.3500 | 0.4301 | 0.4298 | 254.4439 |
1.7931 | 24.0 | 720 | 1.5061 | 50.0702 | 6 | 0 | 2 | 0.5002 | 0.3307 | 0.4152 | 0.4153 | 254.1765 |
1.7931 | 25.0 | 750 | 1.5271 | 50.3567 | 6 | 0 | 2 | 0.5046 | 0.3349 | 0.4216 | 0.4222 | 253.3102 |
1.7931 | 26.0 | 780 | 1.5378 | 50.8240 | 6 | 0 | 2 | 0.5089 | 0.3401 | 0.4210 | 0.4202 | 253.6471 |
1.7931 | 27.0 | 810 | 1.5414 | 50.8294 | 6 | 0 | 2 | 0.5118 | 0.3447 | 0.4282 | 0.4280 | 254.1176 |
1.7931 | 28.0 | 840 | 1.5774 | 52.6591 | 6 | 0 | 2 | 0.5283 | 0.3537 | 0.4390 | 0.4387 | 253.6684 |
1.7931 | 29.0 | 870 | 1.5661 | 52.3420 | 6 | 0 | 2 | 0.5292 | 0.3525 | 0.4376 | 0.4376 | 253.3262 |
1.7931 | 30.0 | 900 | 1.6079 | 51.8227 | 6 | 0 | 2 | 0.5212 | 0.3448 | 0.4313 | 0.4315 | 253.9626 |
1.7931 | 31.0 | 930 | 1.5900 | 51.9129 | 6 | 0 | 2 | 0.5245 | 0.3479 | 0.4327 | 0.4327 | 253.7380 |
Framework versions
- Transformers 4.48.2
- Pytorch 2.4.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.
Citation
This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:
@inproceedings{micallef-borg-2025-melabenchv1,
title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
author = "Micallef, Kurt and
Borg, Claudia",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-acl.1053/",
doi = "10.18653/v1/2025.findings-acl.1053",
pages = "20505--20527",
ISBN = "979-8-89176-256-5",
}
- Downloads last month
- -
Model tree for MLRS/mt5-small_eurlexsum-mlt
Base model
google/mt5-smallDataset used to train MLRS/mt5-small_eurlexsum-mlt
Collection including MLRS/mt5-small_eurlexsum-mlt
Evaluation results
- ChrF on dennlinger/eur-lex-sum malteseMELABench Leaderboard52.140
- Rouge-L on dennlinger/eur-lex-sum malteseMELABench Leaderboard0.440