sigmoid_lr2e-05_b0.1

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.1582	0.1	341	0.3412	1.4613	-0.6760	0.8442	2.1373	-69.9019	-64.1564	-0.9916	-0.9274
0.2165	0.2	682	0.2655	1.8031	-1.3141	0.8714	3.1172	-76.2827	-60.7382	-1.0115	-0.9525
0.0864	0.3	1023	0.2379	0.6173	-3.1475	0.8877	3.7648	-94.6172	-72.5967	-1.0623	-1.0198
0.3192	0.4	1364	0.2003	1.3681	-2.3819	0.9185	3.7500	-86.9604	-65.0880	-1.1691	-1.1334
0.5707	0.5	1705	0.1831	1.2028	-3.2640	0.9293	4.4667	-95.7812	-66.7415	-1.2287	-1.1992
0.0427	0.6	2046	0.1718	1.3838	-3.1327	0.9312	4.5166	-94.4690	-64.9309	-1.1900	-1.1566
0.1956	0.7	2387	0.1608	1.0344	-3.7242	0.9366	4.7586	-100.3841	-68.4254	-1.2044	-1.1795
0.0319	0.8	2728	0.1595	1.0398	-3.7445	0.9348	4.7843	-100.5868	-68.3711	-1.2077	-1.1849
0.0173	0.9	3069	0.1587	1.0566	-3.7222	0.9348	4.7788	-100.3634	-68.2028	-1.2118	-1.1884