MLRS
/

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

BERTu (Maltese News Categories)

This model is a fine-tuned version of MLRS/BERTu on the nlpaueb/multi_eurlex mt dataset. It achieves the following results on the test set:

  • Loss: 0.2734
  • F1: 0.6723

Intended uses & limitations

The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.

Training procedure

The model was fine-tuned using a customised script.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 3
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_ratio: 0.005
  • num_epochs: 200.0
  • early_stopping_patience: 20

Training results

Training Loss Epoch Step Validation Loss F1
0.3962 1.0 548 0.2352 0.4398
0.2143 2.0 1096 0.1898 0.5998
0.1753 3.0 1644 0.1780 0.6361
0.1547 4.0 2192 0.1744 0.6610
0.1401 5.0 2740 0.1725 0.6687
0.1284 6.0 3288 0.1723 0.6814
0.1187 7.0 3836 0.1717 0.6882
0.1119 8.0 4384 0.1725 0.6951
0.1031 9.0 4932 0.1757 0.6997
0.0977 10.0 5480 0.1766 0.7012
0.0861 11.0 6028 0.1767 0.7089
0.0811 12.0 6576 0.1826 0.7060
0.0769 13.0 7124 0.1817 0.7074
0.0733 14.0 7672 0.1865 0.7071
0.0697 15.0 8220 0.1879 0.7090
0.0656 16.0 8768 0.1906 0.7065
0.0633 17.0 9316 0.1921 0.7123
0.0594 18.0 9864 0.1946 0.7152
0.0574 19.0 10412 0.1964 0.7178
0.0545 20.0 10960 0.1988 0.7153
0.0503 21.0 11508 0.2003 0.7149
0.0479 22.0 12056 0.2018 0.7179
0.0459 23.0 12604 0.2041 0.7194
0.0438 24.0 13152 0.2051 0.7197
0.0424 25.0 13700 0.2076 0.7182
0.0404 26.0 14248 0.2089 0.7182
0.0393 27.0 14796 0.2111 0.7167
0.0373 28.0 15344 0.2138 0.7181
0.036 29.0 15892 0.2148 0.7228
0.0346 30.0 16440 0.2186 0.7176
0.0334 31.0 16988 0.2190 0.7179
0.0305 32.0 17536 0.2213 0.7191
0.0301 33.0 18084 0.2214 0.7207
0.0281 34.0 18632 0.2242 0.7192
0.0275 35.0 19180 0.2233 0.7214
0.0266 36.0 19728 0.2258 0.7206
0.0255 37.0 20276 0.2290 0.7176
0.0247 38.0 20824 0.2307 0.7204
0.0238 39.0 21372 0.2321 0.7160
0.0231 40.0 21920 0.2350 0.7235
0.0225 41.0 22468 0.2343 0.7170
0.0208 42.0 23016 0.2369 0.7210
0.0199 43.0 23564 0.2390 0.7205
0.0193 44.0 24112 0.2396 0.7225
0.0188 45.0 24660 0.2414 0.7192
0.0184 46.0 25208 0.2441 0.7185
0.0176 47.0 25756 0.2445 0.7224
0.0172 48.0 26304 0.2468 0.7185
0.0167 49.0 26852 0.2476 0.7187
0.0161 50.0 27400 0.2472 0.7212
0.0158 51.0 27948 0.2511 0.7200
0.0151 52.0 28496 0.2507 0.7201
0.0142 53.0 29044 0.2533 0.7173
0.0137 54.0 29592 0.2550 0.7210
0.0133 55.0 30140 0.2553 0.7191
0.013 56.0 30688 0.2581 0.7213
0.0127 57.0 31236 0.2597 0.7209
0.0121 58.0 31784 0.2616 0.7175
0.012 59.0 32332 0.2605 0.7198
0.0115 60.0 32880 0.2641 0.7207

Framework versions

  • Transformers 4.51.1
  • Pytorch 2.7.0+cu126
  • Datasets 3.2.0
  • Tokenizers 0.21.1

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

CC BY-NC-SA 4.0

Citation

This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:

@inproceedings{micallef-borg-2025-melabenchv1,
    title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
    author = "Micallef, Kurt  and
      Borg, Claudia",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1053/",
    doi = "10.18653/v1/2025.findings-acl.1053",
    pages = "20505--20527",
    ISBN = "979-8-89176-256-5",
}
Downloads last month
-
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MLRS/BERTu_multieurlex-mlt

Base model

MLRS/BERTu
Finetuned
(5)
this model

Dataset used to train MLRS/BERTu_multieurlex-mlt

Collection including MLRS/BERTu_multieurlex-mlt

Evaluation results