collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd0
This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0955
- Num Input Tokens Seen: 41958000
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 0
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
---|---|---|---|---|
No log | 0 | 0 | 1.3909 | 0 |
1.6668 | 0.0066 | 5 | 1.3870 | 278312 |
1.6119 | 0.0132 | 10 | 1.3527 | 556616 |
1.6009 | 0.0198 | 15 | 1.2834 | 837608 |
1.3857 | 0.0265 | 20 | 1.2338 | 1118760 |
1.3881 | 0.0331 | 25 | 1.1927 | 1398912 |
1.1478 | 0.0397 | 30 | 1.1899 | 1677832 |
1.0707 | 0.0463 | 35 | 1.1973 | 1955640 |
0.8858 | 0.0529 | 40 | 1.2362 | 2235312 |
0.7331 | 0.0595 | 45 | 1.2417 | 2518032 |
0.6183 | 0.0662 | 50 | 1.2923 | 2791072 |
0.5582 | 0.0728 | 55 | 1.2281 | 3075256 |
0.4271 | 0.0794 | 60 | 1.2524 | 3353088 |
0.4163 | 0.0860 | 65 | 1.2120 | 3634408 |
0.3374 | 0.0926 | 70 | 1.2174 | 3911328 |
0.3455 | 0.0992 | 75 | 1.2004 | 4189448 |
0.4376 | 0.1059 | 80 | 1.2002 | 4467968 |
0.3698 | 0.1125 | 85 | 1.1903 | 4749624 |
0.2481 | 0.1191 | 90 | 1.1825 | 5025016 |
0.283 | 0.1257 | 95 | 1.1792 | 5302440 |
0.3213 | 0.1323 | 100 | 1.1798 | 5581168 |
0.324 | 0.1389 | 105 | 1.1744 | 5859200 |
0.3241 | 0.1456 | 110 | 1.1715 | 6134296 |
0.2837 | 0.1522 | 115 | 1.1696 | 6409928 |
0.3316 | 0.1588 | 120 | 1.1676 | 6685840 |
0.2437 | 0.1654 | 125 | 1.1633 | 6965840 |
0.3139 | 0.1720 | 130 | 1.1606 | 7244344 |
0.3376 | 0.1786 | 135 | 1.1564 | 7515632 |
0.223 | 0.1852 | 140 | 1.1613 | 7789320 |
0.2748 | 0.1919 | 145 | 1.1551 | 8069448 |
0.35 | 0.1985 | 150 | 1.1544 | 8354176 |
0.28 | 0.2051 | 155 | 1.1523 | 8629120 |
0.2608 | 0.2117 | 160 | 1.1533 | 8907728 |
0.3108 | 0.2183 | 165 | 1.1499 | 9184320 |
0.2177 | 0.2249 | 170 | 1.1489 | 9465120 |
0.2277 | 0.2316 | 175 | 1.1517 | 9744536 |
0.2032 | 0.2382 | 180 | 1.1464 | 10022880 |
0.293 | 0.2448 | 185 | 1.1465 | 10303904 |
0.3273 | 0.2514 | 190 | 1.1483 | 10584120 |
0.2394 | 0.2580 | 195 | 1.1408 | 10861656 |
0.2672 | 0.2646 | 200 | 1.1434 | 11141752 |
0.2725 | 0.2713 | 205 | 1.1453 | 11417264 |
0.298 | 0.2779 | 210 | 1.1364 | 11688920 |
0.2932 | 0.2845 | 215 | 1.1388 | 11971416 |
0.2699 | 0.2911 | 220 | 1.1399 | 12250384 |
0.2514 | 0.2977 | 225 | 1.1402 | 12526224 |
0.2254 | 0.3043 | 230 | 1.1347 | 12799952 |
0.2493 | 0.3109 | 235 | 1.1373 | 13081440 |
0.2417 | 0.3176 | 240 | 1.1365 | 13363136 |
0.244 | 0.3242 | 245 | 1.1347 | 13643968 |
0.3142 | 0.3308 | 250 | 1.1335 | 13919528 |
0.1658 | 0.3374 | 255 | 1.1356 | 14206008 |
0.202 | 0.3440 | 260 | 1.1331 | 14486864 |
0.2557 | 0.3506 | 265 | 1.1315 | 14761784 |
0.1722 | 0.3573 | 270 | 1.1333 | 15039296 |
0.2303 | 0.3639 | 275 | 1.1304 | 15314232 |
0.2371 | 0.3705 | 280 | 1.1301 | 15597320 |
0.1902 | 0.3771 | 285 | 1.1291 | 15872736 |
0.2629 | 0.3837 | 290 | 1.1284 | 16150800 |
0.163 | 0.3903 | 295 | 1.1284 | 16430768 |
0.1573 | 0.3970 | 300 | 1.1281 | 16707136 |
0.3249 | 0.4036 | 305 | 1.1248 | 16978832 |
0.2382 | 0.4102 | 310 | 1.1248 | 17252920 |
0.1851 | 0.4168 | 315 | 1.1255 | 17535728 |
0.2008 | 0.4234 | 320 | 1.1246 | 17814952 |
0.2358 | 0.4300 | 325 | 1.1232 | 18091904 |
0.2164 | 0.4367 | 330 | 1.1221 | 18373208 |
0.2086 | 0.4433 | 335 | 1.1222 | 18648224 |
0.3121 | 0.4499 | 340 | 1.1201 | 18924168 |
0.1846 | 0.4565 | 345 | 1.1206 | 19201296 |
0.2533 | 0.4631 | 350 | 1.1206 | 19477976 |
0.1607 | 0.4697 | 355 | 1.1201 | 19754080 |
0.2174 | 0.4763 | 360 | 1.1204 | 20032800 |
0.2254 | 0.4830 | 365 | 1.1214 | 20308040 |
0.1665 | 0.4896 | 370 | 1.1167 | 20592152 |
0.3489 | 0.4962 | 375 | 1.1176 | 20871128 |
0.2461 | 0.5028 | 380 | 1.1183 | 21148272 |
0.2611 | 0.5094 | 385 | 1.1144 | 21431392 |
0.254 | 0.5160 | 390 | 1.1161 | 21709824 |
0.2069 | 0.5227 | 395 | 1.1180 | 21988536 |
0.2252 | 0.5293 | 400 | 1.1162 | 22262296 |
0.2424 | 0.5359 | 405 | 1.1148 | 22536568 |
0.2396 | 0.5425 | 410 | 1.1134 | 22810944 |
0.2056 | 0.5491 | 415 | 1.1130 | 23092168 |
0.2257 | 0.5557 | 420 | 1.1132 | 23371056 |
0.2015 | 0.5624 | 425 | 1.1144 | 23648456 |
0.1694 | 0.5690 | 430 | 1.1130 | 23920280 |
0.2116 | 0.5756 | 435 | 1.1141 | 24200304 |
0.2563 | 0.5822 | 440 | 1.1116 | 24482536 |
0.1843 | 0.5888 | 445 | 1.1105 | 24757696 |
0.29 | 0.5954 | 450 | 1.1128 | 25033776 |
0.1833 | 0.6021 | 455 | 1.1120 | 25310944 |
0.2481 | 0.6087 | 460 | 1.1101 | 25589696 |
0.2427 | 0.6153 | 465 | 1.1094 | 25870336 |
0.1618 | 0.6219 | 470 | 1.1098 | 26143632 |
0.1532 | 0.6285 | 475 | 1.1103 | 26415136 |
0.2417 | 0.6351 | 480 | 1.1087 | 26696640 |
0.2276 | 0.6417 | 485 | 1.1063 | 26971272 |
0.2721 | 0.6484 | 490 | 1.1083 | 27244992 |
0.2445 | 0.6550 | 495 | 1.1086 | 27518832 |
0.2783 | 0.6616 | 500 | 1.1055 | 27796352 |
0.2091 | 0.6682 | 505 | 1.1059 | 28076680 |
0.2149 | 0.6748 | 510 | 1.1058 | 28351832 |
0.155 | 0.6814 | 515 | 1.1041 | 28630944 |
0.2609 | 0.6881 | 520 | 1.1062 | 28906120 |
0.1983 | 0.6947 | 525 | 1.1052 | 29184576 |
0.2037 | 0.7013 | 530 | 1.1047 | 29458648 |
0.2504 | 0.7079 | 535 | 1.1053 | 29735424 |
0.1773 | 0.7145 | 540 | 1.1033 | 30014992 |
0.2006 | 0.7211 | 545 | 1.1023 | 30297040 |
0.2031 | 0.7278 | 550 | 1.1040 | 30575064 |
0.1955 | 0.7344 | 555 | 1.1019 | 30855488 |
0.2115 | 0.7410 | 560 | 1.1009 | 31138112 |
0.1442 | 0.7476 | 565 | 1.1016 | 31413712 |
0.2747 | 0.7542 | 570 | 1.0999 | 31686776 |
0.2817 | 0.7608 | 575 | 1.1000 | 31967224 |
0.2672 | 0.7674 | 580 | 1.1007 | 32246536 |
0.2848 | 0.7741 | 585 | 1.0999 | 32526544 |
0.2295 | 0.7807 | 590 | 1.0986 | 32802544 |
0.2139 | 0.7873 | 595 | 1.1004 | 33086712 |
0.2875 | 0.7939 | 600 | 1.0997 | 33367720 |
0.1953 | 0.8005 | 605 | 1.0973 | 33641952 |
0.2765 | 0.8071 | 610 | 1.0980 | 33917056 |
0.2177 | 0.8138 | 615 | 1.0984 | 34195776 |
0.3152 | 0.8204 | 620 | 1.0993 | 34478976 |
0.238 | 0.8270 | 625 | 1.0974 | 34752864 |
0.2357 | 0.8336 | 630 | 1.0957 | 35030584 |
0.1876 | 0.8402 | 635 | 1.0980 | 35305176 |
0.1897 | 0.8468 | 640 | 1.0974 | 35591096 |
0.186 | 0.8535 | 645 | 1.0962 | 35869440 |
0.1446 | 0.8601 | 650 | 1.0973 | 36141888 |
0.2382 | 0.8667 | 655 | 1.0975 | 36418656 |
0.2281 | 0.8733 | 660 | 1.0967 | 36703600 |
0.2697 | 0.8799 | 665 | 1.0964 | 36980184 |
0.2257 | 0.8865 | 670 | 1.0952 | 37254704 |
0.1766 | 0.8932 | 675 | 1.0968 | 37534656 |
0.2016 | 0.8998 | 680 | 1.0968 | 37814552 |
0.1946 | 0.9064 | 685 | 1.0957 | 38095440 |
0.2247 | 0.9130 | 690 | 1.0964 | 38367536 |
0.2337 | 0.9196 | 695 | 1.0955 | 38641000 |
0.2263 | 0.9262 | 700 | 1.0949 | 38915896 |
0.2195 | 0.9328 | 705 | 1.0962 | 39190984 |
0.2419 | 0.9395 | 710 | 1.0969 | 39467824 |
0.2151 | 0.9461 | 715 | 1.0963 | 39748312 |
0.2461 | 0.9527 | 720 | 1.0931 | 40027512 |
0.1476 | 0.9593 | 725 | 1.0939 | 40303160 |
0.2374 | 0.9659 | 730 | 1.0956 | 40580344 |
0.193 | 0.9725 | 735 | 1.0953 | 40856568 |
0.2546 | 0.9792 | 740 | 1.0938 | 41131664 |
0.1949 | 0.9858 | 745 | 1.0946 | 41402168 |
0.2002 | 0.9924 | 750 | 1.0955 | 41682720 |
0.1669 | 0.9990 | 755 | 1.0955 | 41958000 |
Framework versions
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 3
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd0
Base model
google/gemma-2-2b