BERTu (Maltese News Categories)

This model is a fine-tuned version of MLRS/BERTu on the nlpaueb/multi_eurlex mt dataset. It achieves the following results on the test set:

Loss: 0.2734
F1: 0.6723

Intended uses & limitations

The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.

Training procedure

The model was fine-tuned using a customised script.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 32
seed: 3
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: inverse_sqrt
lr_scheduler_warmup_ratio: 0.005
num_epochs: 200.0
early_stopping_patience: 20

Training results

Training Loss	Epoch	Step	Validation Loss	F1
0.3962	1.0	548	0.2352	0.4398
0.2143	2.0	1096	0.1898	0.5998
0.1753	3.0	1644	0.1780	0.6361
0.1547	4.0	2192	0.1744	0.6610
0.1401	5.0	2740	0.1725	0.6687
0.1284	6.0	3288	0.1723	0.6814
0.1187	7.0	3836	0.1717	0.6882
0.1119	8.0	4384	0.1725	0.6951
0.1031	9.0	4932	0.1757	0.6997
0.0977	10.0	5480	0.1766	0.7012
0.0861	11.0	6028	0.1767	0.7089
0.0811	12.0	6576	0.1826	0.7060
0.0769	13.0	7124	0.1817	0.7074
0.0733	14.0	7672	0.1865	0.7071
0.0697	15.0	8220	0.1879	0.7090
0.0656	16.0	8768	0.1906	0.7065
0.0633	17.0	9316	0.1921	0.7123
0.0594	18.0	9864	0.1946	0.7152
0.0574	19.0	10412	0.1964	0.7178
0.0545	20.0	10960	0.1988	0.7153
0.0503	21.0	11508	0.2003	0.7149
0.0479	22.0	12056	0.2018	0.7179
0.0459	23.0	12604	0.2041	0.7194
0.0438	24.0	13152	0.2051	0.7197
0.0424	25.0	13700	0.2076	0.7182
0.0404	26.0	14248	0.2089	0.7182
0.0393	27.0	14796	0.2111	0.7167
0.0373	28.0	15344	0.2138	0.7181
0.036	29.0	15892	0.2148	0.7228
0.0346	30.0	16440	0.2186	0.7176
0.0334	31.0	16988	0.2190	0.7179
0.0305	32.0	17536	0.2213	0.7191
0.0301	33.0	18084	0.2214	0.7207
0.0281	34.0	18632	0.2242	0.7192
0.0275	35.0	19180	0.2233	0.7214
0.0266	36.0	19728	0.2258	0.7206
0.0255	37.0	20276	0.2290	0.7176
0.0247	38.0	20824	0.2307	0.7204
0.0238	39.0	21372	0.2321	0.7160
0.0231	40.0	21920	0.2350	0.7235
0.0225	41.0	22468	0.2343	0.7170
0.0208	42.0	23016	0.2369	0.7210
0.0199	43.0	23564	0.2390	0.7205
0.0193	44.0	24112	0.2396	0.7225
0.0188	45.0	24660	0.2414	0.7192
0.0184	46.0	25208	0.2441	0.7185
0.0176	47.0	25756	0.2445	0.7224
0.0172	48.0	26304	0.2468	0.7185
0.0167	49.0	26852	0.2476	0.7187
0.0161	50.0	27400	0.2472	0.7212
0.0158	51.0	27948	0.2511	0.7200
0.0151	52.0	28496	0.2507	0.7201
0.0142	53.0	29044	0.2533	0.7173
0.0137	54.0	29592	0.2550	0.7210
0.0133	55.0	30140	0.2553	0.7191
0.013	56.0	30688	0.2581	0.7213
0.0127	57.0	31236	0.2597	0.7209
0.0121	58.0	31784	0.2616	0.7175
0.012	59.0	32332	0.2605	0.7198
0.0115	60.0	32880	0.2641	0.7207

Framework versions

Transformers 4.51.1
Pytorch 2.7.0+cu126
Datasets 3.2.0
Tokenizers 0.21.1

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

Citation

This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:

@inproceedings{micallef-borg-2025-melabenchv1,
    title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
    author = "Micallef, Kurt  and
      Borg, Claudia",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1053/",
    doi = "10.18653/v1/2025.findings-acl.1053",
    pages = "20505--20527",
    ISBN = "979-8-89176-256-5",
}

MLRS
/

BERTu_multieurlex-mlt

You need to agree to share your contact information to access this model

BERTu (Maltese News Categories)

Intended uses & limitations

Training procedure

Training hyperparameters

Training results

Framework versions

License

Citation

Model tree for MLRS/BERTu_multieurlex-mlt

Dataset used to train MLRS/BERTu_multieurlex-mlt

Collection including MLRS/BERTu_multieurlex-mlt

BERTu

Evaluation results