Llama-3.1-8B-Instruct-KTO-600

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_600 dataset. It achieves the following results on the evaluation set:

Loss: 0.2017
Rewards/chosen: 0.0412
Logps/chosen: -18.3761
Logits/chosen: -2496719.4921
Rewards/rejected: -6.6216
Logps/rejected: -86.0225
Logits/rejected: -7772195.3684
Rewards/margins: 6.6628
Kl: 0.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Logps/chosen	Logits/chosen	Rewards/rejected	Logps/rejected	Logits/rejected	Rewards/margins
0.4993	0.7407	50	0.4998	0.0053	-18.7353	-4714148.5714	0.0043	-19.7637	-7797900.9123	0.0010	2.5227
0.4763	1.4815	100	0.4762	0.1369	-17.4186	-4475767.3651	-0.0490	-20.2970	-7752298.6667	0.1860	5.6644
0.3669	2.2222	150	0.3865	0.1420	-17.3676	-3437302.8571	-0.9359	-29.1656	-7374456.1404	1.0779	0.0
0.2687	2.9630	200	0.2844	0.3564	-15.2243	-3008007.1111	-2.3051	-42.8578	-7507831.0175	2.6615	0.1954
0.2398	3.7037	250	0.2238	0.4618	-14.1696	-2773128.1270	-4.0572	-60.3789	-7716537.2632	4.5191	0.0
0.2508	4.4444	300	0.2089	0.3865	-14.9233	-2774151.1111	-5.0725	-70.5321	-7890091.7895	5.4590	0.0
0.1947	5.1852	350	0.2057	0.2042	-16.7464	-2611237.0794	-5.9252	-79.0592	-7821654.4561	6.1294	0.0
0.1666	5.9259	400	0.2027	0.1387	-17.4006	-2482929.2698	-6.1703	-81.5101	-7752611.9298	6.3091	0.0
0.1956	6.6667	450	0.2023	0.1210	-17.5785	-2528993.0159	-6.2460	-82.2664	-7765871.1579	6.3669	0.0
0.1888	7.4074	500	0.2026	0.0571	-18.2172	-2538207.2381	-6.5054	-84.8605	-7796628.2105	6.5625	0.0
0.2411	8.1481	550	0.2024	0.0368	-18.4202	-2527997.9683	-6.6091	-85.8983	-7806604.3509	6.6459	0.0
0.2231	8.8889	600	0.2018	0.0382	-18.4056	-2503114.1587	-6.5431	-85.2377	-7783503.1579	6.5813	0.0
0.1966	9.6296	650	0.2017	0.0412	-18.3761	-2496719.4921	-6.6216	-86.0225	-7772195.3684	6.6628	0.0

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

chchen
/

Llama-3.1-8B-Instruct-KTO-600

Llama-3.1-8B-Instruct-KTO-600

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for chchen/Llama-3.1-8B-Instruct-KTO-600

Evaluation results