BERTu (Maltese News Categories)

This model is a fine-tuned version of MLRS/BERTu on the nlpaueb/multi_eurlex mt dataset. It achieves the following results on the test set:
- Loss: 0.2734
- F1: 0.6723
Intended uses & limitations
The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.
Training procedure
The model was fine-tuned using a customised script.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 3
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: inverse_sqrt
- lr_scheduler_warmup_ratio: 0.005
- num_epochs: 200.0
- early_stopping_patience: 20
Training results
Training Loss | Epoch | Step | Validation Loss | F1 |
---|---|---|---|---|
0.3962 | 1.0 | 548 | 0.2352 | 0.4398 |
0.2143 | 2.0 | 1096 | 0.1898 | 0.5998 |
0.1753 | 3.0 | 1644 | 0.1780 | 0.6361 |
0.1547 | 4.0 | 2192 | 0.1744 | 0.6610 |
0.1401 | 5.0 | 2740 | 0.1725 | 0.6687 |
0.1284 | 6.0 | 3288 | 0.1723 | 0.6814 |
0.1187 | 7.0 | 3836 | 0.1717 | 0.6882 |
0.1119 | 8.0 | 4384 | 0.1725 | 0.6951 |
0.1031 | 9.0 | 4932 | 0.1757 | 0.6997 |
0.0977 | 10.0 | 5480 | 0.1766 | 0.7012 |
0.0861 | 11.0 | 6028 | 0.1767 | 0.7089 |
0.0811 | 12.0 | 6576 | 0.1826 | 0.7060 |
0.0769 | 13.0 | 7124 | 0.1817 | 0.7074 |
0.0733 | 14.0 | 7672 | 0.1865 | 0.7071 |
0.0697 | 15.0 | 8220 | 0.1879 | 0.7090 |
0.0656 | 16.0 | 8768 | 0.1906 | 0.7065 |
0.0633 | 17.0 | 9316 | 0.1921 | 0.7123 |
0.0594 | 18.0 | 9864 | 0.1946 | 0.7152 |
0.0574 | 19.0 | 10412 | 0.1964 | 0.7178 |
0.0545 | 20.0 | 10960 | 0.1988 | 0.7153 |
0.0503 | 21.0 | 11508 | 0.2003 | 0.7149 |
0.0479 | 22.0 | 12056 | 0.2018 | 0.7179 |
0.0459 | 23.0 | 12604 | 0.2041 | 0.7194 |
0.0438 | 24.0 | 13152 | 0.2051 | 0.7197 |
0.0424 | 25.0 | 13700 | 0.2076 | 0.7182 |
0.0404 | 26.0 | 14248 | 0.2089 | 0.7182 |
0.0393 | 27.0 | 14796 | 0.2111 | 0.7167 |
0.0373 | 28.0 | 15344 | 0.2138 | 0.7181 |
0.036 | 29.0 | 15892 | 0.2148 | 0.7228 |
0.0346 | 30.0 | 16440 | 0.2186 | 0.7176 |
0.0334 | 31.0 | 16988 | 0.2190 | 0.7179 |
0.0305 | 32.0 | 17536 | 0.2213 | 0.7191 |
0.0301 | 33.0 | 18084 | 0.2214 | 0.7207 |
0.0281 | 34.0 | 18632 | 0.2242 | 0.7192 |
0.0275 | 35.0 | 19180 | 0.2233 | 0.7214 |
0.0266 | 36.0 | 19728 | 0.2258 | 0.7206 |
0.0255 | 37.0 | 20276 | 0.2290 | 0.7176 |
0.0247 | 38.0 | 20824 | 0.2307 | 0.7204 |
0.0238 | 39.0 | 21372 | 0.2321 | 0.7160 |
0.0231 | 40.0 | 21920 | 0.2350 | 0.7235 |
0.0225 | 41.0 | 22468 | 0.2343 | 0.7170 |
0.0208 | 42.0 | 23016 | 0.2369 | 0.7210 |
0.0199 | 43.0 | 23564 | 0.2390 | 0.7205 |
0.0193 | 44.0 | 24112 | 0.2396 | 0.7225 |
0.0188 | 45.0 | 24660 | 0.2414 | 0.7192 |
0.0184 | 46.0 | 25208 | 0.2441 | 0.7185 |
0.0176 | 47.0 | 25756 | 0.2445 | 0.7224 |
0.0172 | 48.0 | 26304 | 0.2468 | 0.7185 |
0.0167 | 49.0 | 26852 | 0.2476 | 0.7187 |
0.0161 | 50.0 | 27400 | 0.2472 | 0.7212 |
0.0158 | 51.0 | 27948 | 0.2511 | 0.7200 |
0.0151 | 52.0 | 28496 | 0.2507 | 0.7201 |
0.0142 | 53.0 | 29044 | 0.2533 | 0.7173 |
0.0137 | 54.0 | 29592 | 0.2550 | 0.7210 |
0.0133 | 55.0 | 30140 | 0.2553 | 0.7191 |
0.013 | 56.0 | 30688 | 0.2581 | 0.7213 |
0.0127 | 57.0 | 31236 | 0.2597 | 0.7209 |
0.0121 | 58.0 | 31784 | 0.2616 | 0.7175 |
0.012 | 59.0 | 32332 | 0.2605 | 0.7198 |
0.0115 | 60.0 | 32880 | 0.2641 | 0.7207 |
Framework versions
- Transformers 4.51.1
- Pytorch 2.7.0+cu126
- Datasets 3.2.0
- Tokenizers 0.21.1
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.
Citation
This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:
@inproceedings{micallef-borg-2025-melabenchv1,
title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
author = "Micallef, Kurt and
Borg, Claudia",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-acl.1053/",
doi = "10.18653/v1/2025.findings-acl.1053",
pages = "20505--20527",
ISBN = "979-8-89176-256-5",
}
- Downloads last month
- -
Model tree for MLRS/BERTu_multieurlex-mlt
Base model
MLRS/BERTuDataset used to train MLRS/BERTu_multieurlex-mlt
Collection including MLRS/BERTu_multieurlex-mlt
Evaluation results
- Macro-averaged F1 on nlpaueb/multi_eurlexMELABench Leaderboard30.100