Llama-3.1-8B-Instruct-KTO-1000

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_1000 dataset. It achieves the following results on the evaluation set:

Loss: 0.2016
Rewards/chosen: -0.2178
Logps/chosen: -18.1049
Logits/chosen: -3360374.5684
Rewards/rejected: -7.9685
Logps/rejected: -99.2175
Logits/rejected: -5993503.6952
Rewards/margins: 7.7507
Kl: 0.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Logps/chosen	Logits/chosen	Rewards/rejected	Logps/rejected	Logits/rejected	Rewards/margins
0.4996	0.4444	50	0.4996	0.0049	-15.8775	-5421844.2105	0.0013	-19.5191	-7437597.2571	0.0036	5.5406
0.4926	0.8889	100	0.4927	0.0680	-15.2462	-5315816.4211	0.0091	-19.4418	-7413740.4952	0.0590	4.7451
0.3935	1.3333	150	0.3965	0.2993	-12.9332	-4366973.9789	-0.5726	-25.2580	-6947738.2095	0.8719	0.6684
0.288	1.7778	200	0.2868	0.4599	-11.3269	-3715966.9895	-1.8666	-38.1983	-6637966.6286	2.3265	0.0
0.2304	2.2222	250	0.2456	0.2811	-13.1157	-3821936.5053	-4.1256	-60.7884	-6486972.9524	4.4067	0.0
0.2265	2.6667	300	0.2277	0.2055	-13.8714	-3639481.9368	-5.1055	-70.5871	-6323365.7905	5.3110	0.0
0.1787	3.1111	350	0.2252	0.0093	-15.8332	-3060385.6842	-6.2024	-81.5565	-5682171.1238	6.2117	0.0
0.1818	3.5556	400	0.2285	0.0137	-15.7897	-2924462.4842	-6.4299	-83.8315	-5623589.1810	6.4436	0.0
0.1921	4.0	450	0.2127	-0.0080	-16.0069	-3297428.2105	-6.8889	-88.4215	-5958031.8476	6.8809	0.0
0.1945	4.4444	500	0.2114	-0.0668	-16.5945	-3297794.6947	-7.2699	-92.2313	-5972243.5048	7.2031	0.0
0.2105	4.8889	550	0.2067	-0.0350	-16.2766	-3147055.1579	-7.0596	-90.1288	-5862926.6286	7.0246	0.0
0.1921	5.3333	600	0.2064	-0.0570	-16.4969	-3241836.4632	-7.2054	-91.5865	-5997722.8190	7.1484	0.0
0.1614	5.7778	650	0.2070	-0.1499	-17.4258	-3228708.7158	-7.6014	-95.5464	-5918022.0952	7.4515	0.0
0.1896	6.2222	700	0.2123	-0.2047	-17.9736	-3418014.3158	-7.6625	-96.1570	-6086026.9714	7.4577	0.0
0.1631	6.6667	750	0.2076	-0.1804	-17.7305	-3385464.2526	-7.6603	-96.1349	-6043348.1143	7.4798	0.0
0.1704	7.1111	800	0.2064	-0.1567	-17.4936	-3383563.1158	-7.6349	-95.8816	-6061806.3238	7.4782	0.0
0.1902	7.5556	850	0.2029	-0.2018	-17.9440	-3373625.6	-7.8793	-98.3253	-6032148.7238	7.6775	0.0
0.174	8.0	900	0.2016	-0.2178	-18.1049	-3360374.5684	-7.9685	-99.2175	-5993503.6952	7.7507	0.0
0.2268	8.4444	950	0.2036	-0.2365	-18.2911	-3331174.4	-8.0276	-99.8082	-5953203.8095	7.7911	0.0
0.1646	8.8889	1000	0.2038	-0.2586	-18.5126	-3326715.9579	-8.0877	-100.4094	-5970805.6381	7.8291	0.0
0.1964	9.3333	1050	0.2038	-0.2629	-18.5557	-3347635.5368	-8.0931	-100.4632	-5967138.1333	7.8302	0.0
0.1483	9.7778	1100	0.2076	-0.2689	-18.6153	-3328483.0316	-8.0719	-100.2517	-5965142.5524	7.8031	0.0

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

chchen
/

Llama-3.1-8B-Instruct-KTO-1000

Llama-3.1-8B-Instruct-KTO-1000

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for chchen/Llama-3.1-8B-Instruct-KTO-1000

Evaluation results