collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0955
  • Num Input Tokens Seen: 41958000

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.6668 0.0066 5 1.3870 278312
1.6119 0.0132 10 1.3527 556616
1.6009 0.0198 15 1.2834 837608
1.3857 0.0265 20 1.2338 1118760
1.3881 0.0331 25 1.1927 1398912
1.1478 0.0397 30 1.1899 1677832
1.0707 0.0463 35 1.1973 1955640
0.8858 0.0529 40 1.2362 2235312
0.7331 0.0595 45 1.2417 2518032
0.6183 0.0662 50 1.2923 2791072
0.5582 0.0728 55 1.2281 3075256
0.4271 0.0794 60 1.2524 3353088
0.4163 0.0860 65 1.2120 3634408
0.3374 0.0926 70 1.2174 3911328
0.3455 0.0992 75 1.2004 4189448
0.4376 0.1059 80 1.2002 4467968
0.3698 0.1125 85 1.1903 4749624
0.2481 0.1191 90 1.1825 5025016
0.283 0.1257 95 1.1792 5302440
0.3213 0.1323 100 1.1798 5581168
0.324 0.1389 105 1.1744 5859200
0.3241 0.1456 110 1.1715 6134296
0.2837 0.1522 115 1.1696 6409928
0.3316 0.1588 120 1.1676 6685840
0.2437 0.1654 125 1.1633 6965840
0.3139 0.1720 130 1.1606 7244344
0.3376 0.1786 135 1.1564 7515632
0.223 0.1852 140 1.1613 7789320
0.2748 0.1919 145 1.1551 8069448
0.35 0.1985 150 1.1544 8354176
0.28 0.2051 155 1.1523 8629120
0.2608 0.2117 160 1.1533 8907728
0.3108 0.2183 165 1.1499 9184320
0.2177 0.2249 170 1.1489 9465120
0.2277 0.2316 175 1.1517 9744536
0.2032 0.2382 180 1.1464 10022880
0.293 0.2448 185 1.1465 10303904
0.3273 0.2514 190 1.1483 10584120
0.2394 0.2580 195 1.1408 10861656
0.2672 0.2646 200 1.1434 11141752
0.2725 0.2713 205 1.1453 11417264
0.298 0.2779 210 1.1364 11688920
0.2932 0.2845 215 1.1388 11971416
0.2699 0.2911 220 1.1399 12250384
0.2514 0.2977 225 1.1402 12526224
0.2254 0.3043 230 1.1347 12799952
0.2493 0.3109 235 1.1373 13081440
0.2417 0.3176 240 1.1365 13363136
0.244 0.3242 245 1.1347 13643968
0.3142 0.3308 250 1.1335 13919528
0.1658 0.3374 255 1.1356 14206008
0.202 0.3440 260 1.1331 14486864
0.2557 0.3506 265 1.1315 14761784
0.1722 0.3573 270 1.1333 15039296
0.2303 0.3639 275 1.1304 15314232
0.2371 0.3705 280 1.1301 15597320
0.1902 0.3771 285 1.1291 15872736
0.2629 0.3837 290 1.1284 16150800
0.163 0.3903 295 1.1284 16430768
0.1573 0.3970 300 1.1281 16707136
0.3249 0.4036 305 1.1248 16978832
0.2382 0.4102 310 1.1248 17252920
0.1851 0.4168 315 1.1255 17535728
0.2008 0.4234 320 1.1246 17814952
0.2358 0.4300 325 1.1232 18091904
0.2164 0.4367 330 1.1221 18373208
0.2086 0.4433 335 1.1222 18648224
0.3121 0.4499 340 1.1201 18924168
0.1846 0.4565 345 1.1206 19201296
0.2533 0.4631 350 1.1206 19477976
0.1607 0.4697 355 1.1201 19754080
0.2174 0.4763 360 1.1204 20032800
0.2254 0.4830 365 1.1214 20308040
0.1665 0.4896 370 1.1167 20592152
0.3489 0.4962 375 1.1176 20871128
0.2461 0.5028 380 1.1183 21148272
0.2611 0.5094 385 1.1144 21431392
0.254 0.5160 390 1.1161 21709824
0.2069 0.5227 395 1.1180 21988536
0.2252 0.5293 400 1.1162 22262296
0.2424 0.5359 405 1.1148 22536568
0.2396 0.5425 410 1.1134 22810944
0.2056 0.5491 415 1.1130 23092168
0.2257 0.5557 420 1.1132 23371056
0.2015 0.5624 425 1.1144 23648456
0.1694 0.5690 430 1.1130 23920280
0.2116 0.5756 435 1.1141 24200304
0.2563 0.5822 440 1.1116 24482536
0.1843 0.5888 445 1.1105 24757696
0.29 0.5954 450 1.1128 25033776
0.1833 0.6021 455 1.1120 25310944
0.2481 0.6087 460 1.1101 25589696
0.2427 0.6153 465 1.1094 25870336
0.1618 0.6219 470 1.1098 26143632
0.1532 0.6285 475 1.1103 26415136
0.2417 0.6351 480 1.1087 26696640
0.2276 0.6417 485 1.1063 26971272
0.2721 0.6484 490 1.1083 27244992
0.2445 0.6550 495 1.1086 27518832
0.2783 0.6616 500 1.1055 27796352
0.2091 0.6682 505 1.1059 28076680
0.2149 0.6748 510 1.1058 28351832
0.155 0.6814 515 1.1041 28630944
0.2609 0.6881 520 1.1062 28906120
0.1983 0.6947 525 1.1052 29184576
0.2037 0.7013 530 1.1047 29458648
0.2504 0.7079 535 1.1053 29735424
0.1773 0.7145 540 1.1033 30014992
0.2006 0.7211 545 1.1023 30297040
0.2031 0.7278 550 1.1040 30575064
0.1955 0.7344 555 1.1019 30855488
0.2115 0.7410 560 1.1009 31138112
0.1442 0.7476 565 1.1016 31413712
0.2747 0.7542 570 1.0999 31686776
0.2817 0.7608 575 1.1000 31967224
0.2672 0.7674 580 1.1007 32246536
0.2848 0.7741 585 1.0999 32526544
0.2295 0.7807 590 1.0986 32802544
0.2139 0.7873 595 1.1004 33086712
0.2875 0.7939 600 1.0997 33367720
0.1953 0.8005 605 1.0973 33641952
0.2765 0.8071 610 1.0980 33917056
0.2177 0.8138 615 1.0984 34195776
0.3152 0.8204 620 1.0993 34478976
0.238 0.8270 625 1.0974 34752864
0.2357 0.8336 630 1.0957 35030584
0.1876 0.8402 635 1.0980 35305176
0.1897 0.8468 640 1.0974 35591096
0.186 0.8535 645 1.0962 35869440
0.1446 0.8601 650 1.0973 36141888
0.2382 0.8667 655 1.0975 36418656
0.2281 0.8733 660 1.0967 36703600
0.2697 0.8799 665 1.0964 36980184
0.2257 0.8865 670 1.0952 37254704
0.1766 0.8932 675 1.0968 37534656
0.2016 0.8998 680 1.0968 37814552
0.1946 0.9064 685 1.0957 38095440
0.2247 0.9130 690 1.0964 38367536
0.2337 0.9196 695 1.0955 38641000
0.2263 0.9262 700 1.0949 38915896
0.2195 0.9328 705 1.0962 39190984
0.2419 0.9395 710 1.0969 39467824
0.2151 0.9461 715 1.0963 39748312
0.2461 0.9527 720 1.0931 40027512
0.1476 0.9593 725 1.0939 40303160
0.2374 0.9659 730 1.0956 40580344
0.193 0.9725 735 1.0953 40856568
0.2546 0.9792 740 1.0938 41131664
0.1949 0.9858 745 1.0946 41402168
0.2002 0.9924 750 1.0955 41682720
0.1669 0.9990 755 1.0955 41958000

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd0

Base model

google/gemma-2-2b
Finetuned
(491)
this model