Phi4-5.6B-transformers-ex1

This model is a fine-tuned version of microsoft/Phi-4-multimodal-instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.4529

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 4
optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.95) and epsilon=1e-07 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 50
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss
0.1653	0.0799	20	0.1542
0.1324	0.1598	40	0.1429
0.2598	0.2398	60	0.3326
0.1638	0.3197	80	0.1500
0.1499	0.3996	100	0.4031
0.15	0.4795	120	0.3213
0.1679	0.5594	140	0.1489
0.1431	0.6394	160	0.1531
0.1462	0.7193	180	0.1488
0.1464	0.7992	200	0.1485
0.1379	0.8791	220	0.1482
0.1414	0.9590	240	0.1567
0.1328	1.0360	260	0.1472
0.134	1.1159	280	0.1466
0.1415	1.1958	300	0.1447
0.141	1.2757	320	0.1470
0.1378	1.3556	340	0.1685
0.1425	1.4356	360	0.1560
0.1405	1.5155	380	0.1412
0.135	1.5954	400	0.1512
0.1359	1.6753	420	0.1410
0.1336	1.7552	440	0.1394
0.1317	1.8352	460	0.1408
0.1323	1.9151	480	0.1497
0.1349	1.9950	500	0.1387
0.1204	2.0719	520	0.1407
0.1286	2.1518	540	0.1399
0.1333	2.2318	560	0.1414
0.1315	2.3117	580	0.1398
0.1313	2.3916	600	0.1455
0.1308	2.4715	620	0.1377
0.1327	2.5514	640	0.1400
0.1324	2.6314	660	0.1370
0.1309	2.7113	680	0.1343
0.1274	2.7912	700	0.1384
0.1287	2.8711	720	0.1353
0.1285	2.9510	740	0.1341
0.1256	3.0280	760	0.1380
0.1256	3.1079	780	0.1340
0.1224	3.1878	800	0.1372
0.1244	3.2677	820	0.1358
0.1256	3.3477	840	0.1337
0.1229	3.4276	860	0.1336
0.1252	3.5075	880	0.1333
0.1234	3.5874	900	0.1360
0.1276	3.6673	920	0.1344
0.1258	3.7473	940	0.1327
0.1249	3.8272	960	0.1357
0.1273	3.9071	980	0.1346
0.1266	3.9870	1000	0.1356
0.1172	4.0639	1020	0.1413
0.1236	4.1439	1040	0.1396
0.1219	4.2238	1060	0.1368
0.1187	4.3037	1080	0.1399
0.1225	4.3836	1100	0.1387
0.1243	4.4635	1120	0.1370
0.1218	4.5435	1140	0.1360
0.1189	4.6234	1160	0.1325
0.1185	4.7033	1180	0.1373
0.1251	4.7832	1200	0.1352
0.1214	4.8631	1220	0.1333
0.1225	4.9431	1240	0.1339
0.1138	5.0200	1260	0.1348
0.1205	5.0999	1280	0.1415
0.1208	5.1798	1300	0.1434
0.1165	5.2597	1320	0.1415
0.1154	5.3397	1340	0.1392
0.1143	5.4196	1360	0.1442
0.1165	5.4995	1380	0.1397
0.1162	5.5794	1400	0.1414
0.1148	5.6593	1420	0.1389
0.1133	5.7393	1440	0.1391
0.1145	5.8192	1460	0.1393
0.1152	5.8991	1480	0.1397
0.113	5.9790	1500	0.1407
0.0993	6.0559	1520	0.1625
0.0962	6.1359	1540	0.1609
0.0995	6.2158	1560	0.1573
0.1028	6.2957	1580	0.1582
0.0983	6.3756	1600	0.1620
0.0989	6.4555	1620	0.1572
0.0987	6.5355	1640	0.1602
0.0992	6.6154	1660	0.1593
0.0997	6.6953	1680	0.1644
0.0967	6.7752	1700	0.1630
0.0988	6.8551	1720	0.1596
0.098	6.9351	1740	0.1605
0.0915	7.0120	1760	0.1662
0.0666	7.0919	1780	0.2258
0.0638	7.1718	1800	0.2135
0.0581	7.2517	1820	0.2290
0.065	7.3317	1840	0.2115
0.0611	7.4116	1860	0.2396
0.059	7.4915	1880	0.2205
0.0598	7.5714	1900	0.2314
0.0608	7.6513	1920	0.2309
0.063	7.7313	1940	0.2383
0.0621	7.8112	1960	0.2304
0.0586	7.8911	1980	0.2433
0.0622	7.9710	2000	0.2354
0.0369	8.0480	2020	0.3233
0.0246	8.1279	2040	0.3437
0.022	8.2078	2060	0.3361
0.0243	8.2877	2080	0.3413
0.0235	8.3676	2100	0.3458
0.0229	8.4476	2120	0.3473
0.0218	8.5275	2140	0.3523
0.0234	8.6074	2160	0.3610
0.0228	8.6873	2180	0.3496
0.0221	8.7672	2200	0.3519
0.0223	8.8472	2220	0.3515
0.0224	8.9271	2240	0.3514
0.0193	9.0040	2260	0.3542
0.0081	9.0839	2280	0.4155
0.0071	9.1638	2300	0.4363
0.0065	9.2438	2320	0.4446
0.0057	9.3237	2340	0.4485
0.0064	9.4036	2360	0.4495
0.0071	9.4835	2380	0.4502
0.0058	9.5634	2400	0.4518
0.0066	9.6434	2420	0.4530
0.0072	9.7233	2440	0.4535
0.0064	9.8032	2460	0.4532
0.0076	9.8831	2480	0.4533
0.0063	9.9630	2500	0.4529

Framework versions

Transformers 4.48.2
Pytorch 2.6.0+cu124
Datasets 3.4.1
Tokenizers 0.21.1

minhtien2405
/

Phi4-5.6B-transformers-ex1

Phi4-5.6B-transformers-ex1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for minhtien2405/Phi4-5.6B-transformers-ex1

Evaluation results