collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.0955
Num Input Tokens Seen: 41958000

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 0
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3909	0
1.6668	0.0066	5	1.3870	278312
1.6119	0.0132	10	1.3527	556616
1.6009	0.0198	15	1.2834	837608
1.3857	0.0265	20	1.2338	1118760
1.3881	0.0331	25	1.1927	1398912
1.1478	0.0397	30	1.1899	1677832
1.0707	0.0463	35	1.1973	1955640
0.8858	0.0529	40	1.2362	2235312
0.7331	0.0595	45	1.2417	2518032
0.6183	0.0662	50	1.2923	2791072
0.5582	0.0728	55	1.2281	3075256
0.4271	0.0794	60	1.2524	3353088
0.4163	0.0860	65	1.2120	3634408
0.3374	0.0926	70	1.2174	3911328
0.3455	0.0992	75	1.2004	4189448
0.4376	0.1059	80	1.2002	4467968
0.3698	0.1125	85	1.1903	4749624
0.2481	0.1191	90	1.1825	5025016
0.283	0.1257	95	1.1792	5302440
0.3213	0.1323	100	1.1798	5581168
0.324	0.1389	105	1.1744	5859200
0.3241	0.1456	110	1.1715	6134296
0.2837	0.1522	115	1.1696	6409928
0.3316	0.1588	120	1.1676	6685840
0.2437	0.1654	125	1.1633	6965840
0.3139	0.1720	130	1.1606	7244344
0.3376	0.1786	135	1.1564	7515632
0.223	0.1852	140	1.1613	7789320
0.2748	0.1919	145	1.1551	8069448
0.35	0.1985	150	1.1544	8354176
0.28	0.2051	155	1.1523	8629120
0.2608	0.2117	160	1.1533	8907728
0.3108	0.2183	165	1.1499	9184320
0.2177	0.2249	170	1.1489	9465120
0.2277	0.2316	175	1.1517	9744536
0.2032	0.2382	180	1.1464	10022880
0.293	0.2448	185	1.1465	10303904
0.3273	0.2514	190	1.1483	10584120
0.2394	0.2580	195	1.1408	10861656
0.2672	0.2646	200	1.1434	11141752
0.2725	0.2713	205	1.1453	11417264
0.298	0.2779	210	1.1364	11688920
0.2932	0.2845	215	1.1388	11971416
0.2699	0.2911	220	1.1399	12250384
0.2514	0.2977	225	1.1402	12526224
0.2254	0.3043	230	1.1347	12799952
0.2493	0.3109	235	1.1373	13081440
0.2417	0.3176	240	1.1365	13363136
0.244	0.3242	245	1.1347	13643968
0.3142	0.3308	250	1.1335	13919528
0.1658	0.3374	255	1.1356	14206008
0.202	0.3440	260	1.1331	14486864
0.2557	0.3506	265	1.1315	14761784
0.1722	0.3573	270	1.1333	15039296
0.2303	0.3639	275	1.1304	15314232
0.2371	0.3705	280	1.1301	15597320
0.1902	0.3771	285	1.1291	15872736
0.2629	0.3837	290	1.1284	16150800
0.163	0.3903	295	1.1284	16430768
0.1573	0.3970	300	1.1281	16707136
0.3249	0.4036	305	1.1248	16978832
0.2382	0.4102	310	1.1248	17252920
0.1851	0.4168	315	1.1255	17535728
0.2008	0.4234	320	1.1246	17814952
0.2358	0.4300	325	1.1232	18091904
0.2164	0.4367	330	1.1221	18373208
0.2086	0.4433	335	1.1222	18648224
0.3121	0.4499	340	1.1201	18924168
0.1846	0.4565	345	1.1206	19201296
0.2533	0.4631	350	1.1206	19477976
0.1607	0.4697	355	1.1201	19754080
0.2174	0.4763	360	1.1204	20032800
0.2254	0.4830	365	1.1214	20308040
0.1665	0.4896	370	1.1167	20592152
0.3489	0.4962	375	1.1176	20871128
0.2461	0.5028	380	1.1183	21148272
0.2611	0.5094	385	1.1144	21431392
0.254	0.5160	390	1.1161	21709824
0.2069	0.5227	395	1.1180	21988536
0.2252	0.5293	400	1.1162	22262296
0.2424	0.5359	405	1.1148	22536568
0.2396	0.5425	410	1.1134	22810944
0.2056	0.5491	415	1.1130	23092168
0.2257	0.5557	420	1.1132	23371056
0.2015	0.5624	425	1.1144	23648456
0.1694	0.5690	430	1.1130	23920280
0.2116	0.5756	435	1.1141	24200304
0.2563	0.5822	440	1.1116	24482536
0.1843	0.5888	445	1.1105	24757696
0.29	0.5954	450	1.1128	25033776
0.1833	0.6021	455	1.1120	25310944
0.2481	0.6087	460	1.1101	25589696
0.2427	0.6153	465	1.1094	25870336
0.1618	0.6219	470	1.1098	26143632
0.1532	0.6285	475	1.1103	26415136
0.2417	0.6351	480	1.1087	26696640
0.2276	0.6417	485	1.1063	26971272
0.2721	0.6484	490	1.1083	27244992
0.2445	0.6550	495	1.1086	27518832
0.2783	0.6616	500	1.1055	27796352
0.2091	0.6682	505	1.1059	28076680
0.2149	0.6748	510	1.1058	28351832
0.155	0.6814	515	1.1041	28630944
0.2609	0.6881	520	1.1062	28906120
0.1983	0.6947	525	1.1052	29184576
0.2037	0.7013	530	1.1047	29458648
0.2504	0.7079	535	1.1053	29735424
0.1773	0.7145	540	1.1033	30014992
0.2006	0.7211	545	1.1023	30297040
0.2031	0.7278	550	1.1040	30575064
0.1955	0.7344	555	1.1019	30855488
0.2115	0.7410	560	1.1009	31138112
0.1442	0.7476	565	1.1016	31413712
0.2747	0.7542	570	1.0999	31686776
0.2817	0.7608	575	1.1000	31967224
0.2672	0.7674	580	1.1007	32246536
0.2848	0.7741	585	1.0999	32526544
0.2295	0.7807	590	1.0986	32802544
0.2139	0.7873	595	1.1004	33086712
0.2875	0.7939	600	1.0997	33367720
0.1953	0.8005	605	1.0973	33641952
0.2765	0.8071	610	1.0980	33917056
0.2177	0.8138	615	1.0984	34195776
0.3152	0.8204	620	1.0993	34478976
0.238	0.8270	625	1.0974	34752864
0.2357	0.8336	630	1.0957	35030584
0.1876	0.8402	635	1.0980	35305176
0.1897	0.8468	640	1.0974	35591096
0.186	0.8535	645	1.0962	35869440
0.1446	0.8601	650	1.0973	36141888
0.2382	0.8667	655	1.0975	36418656
0.2281	0.8733	660	1.0967	36703600
0.2697	0.8799	665	1.0964	36980184
0.2257	0.8865	670	1.0952	37254704
0.1766	0.8932	675	1.0968	37534656
0.2016	0.8998	680	1.0968	37814552
0.1946	0.9064	685	1.0957	38095440
0.2247	0.9130	690	1.0964	38367536
0.2337	0.9196	695	1.0955	38641000
0.2263	0.9262	700	1.0949	38915896
0.2195	0.9328	705	1.0962	39190984
0.2419	0.9395	710	1.0969	39467824
0.2151	0.9461	715	1.0963	39748312
0.2461	0.9527	720	1.0931	40027512
0.1476	0.9593	725	1.0939	40303160
0.2374	0.9659	730	1.0956	40580344
0.193	0.9725	735	1.0953	40856568
0.2546	0.9792	740	1.0938	41131664
0.1949	0.9858	745	1.0946	41402168
0.2002	0.9924	750	1.0955	41682720
0.1669	0.9990	755	1.0955	41958000

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RylanSchaeffer
/

collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd0

collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulate_iter8_sftsd0

Evaluation results