MLRS
/

Transformers
TensorBoard
Safetensors
Maltese
mt5
text2text-generation
Eval Results

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

mT5-Small (Eur-Lex-Sum Maltese)

This model is a fine-tuned version of google/mt5-small on the dennlinger/eur-lex-sum maltese dataset. It achieves the following results on the test set:

  • Loss: 1.4531
  • Chrf:
    • Score: 51.5481
    • Char Order: 6
    • Word Order: 0
    • Beta: 2
  • Rouge:
    • Rouge1: 0.5176
    • Rouge2: 0.3497
    • Rougel: 0.4249
    • Rougelsum: 0.4247
  • Gen Len: 254.8511

Intended uses & limitations

The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.

Training procedure

The model was fine-tuned using a customised script.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Use adafactor and the args are: No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 200.0
  • early_stopping_patience: 20

Training results

Training Loss Epoch Step Validation Loss Chrf Score Chrf Char Order Chrf Word Order Chrf Beta Rouge Rouge1 Rouge Rouge2 Rouge Rougel Rouge Rougelsum Gen Len
No log 1.0 30 2.2500 18.4506 6 0 2 0.1901 0.0868 0.1728 0.1729 255.0
No log 2.0 60 1.9908 40.0789 6 0 2 0.3872 0.2330 0.3379 0.3376 255.0
No log 3.0 90 1.7490 44.1723 6 0 2 0.4406 0.2759 0.3760 0.3758 255.0
No log 4.0 120 1.7205 49.4429 6 0 2 0.4885 0.3313 0.4081 0.4079 255.0
No log 5.0 150 1.5647 46.3055 6 0 2 0.4626 0.3068 0.3886 0.3886 255.0
No log 6.0 180 1.5374 46.3856 6 0 2 0.4756 0.3169 0.3986 0.3989 254.4439
No log 7.0 210 1.5262 47.2806 6 0 2 0.4706 0.3154 0.3959 0.3962 254.7807
No log 8.0 240 1.5142 48.5214 6 0 2 0.4916 0.3255 0.4121 0.4119 254.8449
No log 9.0 270 1.5271 49.4788 6 0 2 0.4982 0.3350 0.4211 0.4210 253.9893
No log 10.0 300 1.4995 48.3063 6 0 2 0.4832 0.3224 0.4127 0.4126 254.6684
No log 11.0 330 1.4947 52.1382 6 0 2 0.5213 0.3593 0.4416 0.4418 254.7914
No log 12.0 360 1.4704 49.9226 6 0 2 0.5004 0.3363 0.4236 0.4235 254.6203
No log 13.0 390 1.4933 51.6030 6 0 2 0.5199 0.3514 0.4317 0.4318 253.6257
No log 14.0 420 1.4640 47.8714 6 0 2 0.4840 0.3242 0.4094 0.4091 254.6952
No log 15.0 450 1.4726 51.2718 6 0 2 0.5188 0.3488 0.4354 0.4356 254.7166
No log 16.0 480 1.4667 49.9968 6 0 2 0.4989 0.3400 0.4287 0.4281 254.6203
1.7931 17.0 510 1.4624 50.7874 6 0 2 0.5123 0.3436 0.4345 0.4345 254.5508
1.7931 18.0 540 1.4775 50.5126 6 0 2 0.5121 0.3448 0.4273 0.4274 253.4439
1.7931 19.0 570 1.4762 50.7875 6 0 2 0.5194 0.3458 0.4311 0.4315 252.6631
1.7931 20.0 600 1.5157 52.2624 6 0 2 0.5187 0.3446 0.4324 0.4323 253.8289
1.7931 21.0 630 1.4982 51.8279 6 0 2 0.5161 0.3478 0.4368 0.4369 254.3529
1.7931 22.0 660 1.5087 51.9486 6 0 2 0.5174 0.3438 0.4315 0.4310 254.7807
1.7931 23.0 690 1.5355 51.9191 6 0 2 0.5224 0.3500 0.4301 0.4298 254.4439
1.7931 24.0 720 1.5061 50.0702 6 0 2 0.5002 0.3307 0.4152 0.4153 254.1765
1.7931 25.0 750 1.5271 50.3567 6 0 2 0.5046 0.3349 0.4216 0.4222 253.3102
1.7931 26.0 780 1.5378 50.8240 6 0 2 0.5089 0.3401 0.4210 0.4202 253.6471
1.7931 27.0 810 1.5414 50.8294 6 0 2 0.5118 0.3447 0.4282 0.4280 254.1176
1.7931 28.0 840 1.5774 52.6591 6 0 2 0.5283 0.3537 0.4390 0.4387 253.6684
1.7931 29.0 870 1.5661 52.3420 6 0 2 0.5292 0.3525 0.4376 0.4376 253.3262
1.7931 30.0 900 1.6079 51.8227 6 0 2 0.5212 0.3448 0.4313 0.4315 253.9626
1.7931 31.0 930 1.5900 51.9129 6 0 2 0.5245 0.3479 0.4327 0.4327 253.7380

Framework versions

  • Transformers 4.48.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

CC BY-NC-SA 4.0

Citation

This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:

@inproceedings{micallef-borg-2025-melabenchv1,
    title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
    author = "Micallef, Kurt  and
      Borg, Claudia",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1053/",
    doi = "10.18653/v1/2025.findings-acl.1053",
    pages = "20505--20527",
    ISBN = "979-8-89176-256-5",
}
Downloads last month
-
Safetensors
Model size
300M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MLRS/mt5-small_eurlexsum-mlt

Base model

google/mt5-small
Finetuned
(537)
this model

Dataset used to train MLRS/mt5-small_eurlexsum-mlt

Collection including MLRS/mt5-small_eurlexsum-mlt

Evaluation results