mT5-Small (MultiEURLEX Maltese)
This model is a fine-tuned version of google/mt5-small on the nlpaueb/multi_eurlex mt dataset. It achieves the following results on the test set:
- Loss: 0.3648
- F1: 0.3125
Intended uses & limitations
The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.
Training procedure
The model was fine-tuned using a customised script.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- optimizer: Use adafactor and the args are: No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 200.0
- early_stopping_patience: 20
Training results
Training Loss | Epoch | Step | Validation Loss | F1 |
---|---|---|---|---|
1.5559 | 1.0 | 548 | 0.4136 | 0.2994 |
0.424 | 2.0 | 1096 | 0.3933 | 0.2995 |
0.4078 | 3.0 | 1644 | 0.3755 | 0.3007 |
0.3848 | 4.0 | 2192 | 0.3663 | 0.2990 |
0.3714 | 5.0 | 2740 | 0.3571 | 0.2987 |
0.3599 | 6.0 | 3288 | 0.3452 | 0.3010 |
0.3436 | 7.0 | 3836 | 0.3237 | 0.3010 |
0.3358 | 8.0 | 4384 | 0.3232 | 0.3009 |
0.3292 | 9.0 | 4932 | 0.3145 | 0.2989 |
0.3196 | 10.0 | 5480 | 0.3101 | 0.2983 |
0.3045 | 11.0 | 6028 | 0.3111 | 0.2985 |
0.301 | 12.0 | 6576 | 0.3009 | 0.2941 |
0.3017 | 13.0 | 7124 | 0.3081 | 0.2911 |
0.3008 | 14.0 | 7672 | 0.3077 | 0.2952 |
0.2945 | 15.0 | 8220 | 0.3013 | 0.2982 |
0.2933 | 16.0 | 8768 | 0.2941 | 0.2940 |
0.2858 | 17.0 | 9316 | 0.3019 | 0.2918 |
0.2849 | 18.0 | 9864 | 0.2933 | 0.2965 |
0.2804 | 19.0 | 10412 | 0.2937 | 0.2918 |
0.2814 | 20.0 | 10960 | 0.2969 | 0.2960 |
0.2735 | 21.0 | 11508 | 0.2983 | 0.2925 |
0.2735 | 22.0 | 12056 | 0.3021 | 0.2986 |
0.2713 | 23.0 | 12604 | 0.2953 | 0.2956 |
0.2704 | 24.0 | 13152 | 0.3007 | 0.2959 |
0.2634 | 25.0 | 13700 | 0.3044 | 0.2986 |
0.2678 | 26.0 | 14248 | 0.2996 | 0.3005 |
0.2611 | 27.0 | 14796 | 0.2942 | 0.2961 |
Framework versions
- Transformers 4.51.1
- Pytorch 2.7.0+cu126
- Datasets 3.2.0
- Tokenizers 0.21.1
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.
Citation
This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:
@inproceedings{micallef-borg-2025-melabenchv1,
title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
author = "Micallef, Kurt and
Borg, Claudia",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-acl.1053/",
doi = "10.18653/v1/2025.findings-acl.1053",
pages = "20505--20527",
ISBN = "979-8-89176-256-5",
}
- Downloads last month
- -
Model tree for MLRS/mt5-small_multieurlex-mlt
Base model
google/mt5-smallDataset used to train MLRS/mt5-small_multieurlex-mlt
Collection including MLRS/mt5-small_multieurlex-mlt
Evaluation results
- Macro-averaged F1 on nlpaueb/multi_eurlexMELABench Leaderboard30.100