Llama-3.1-8B-Instruct-KTO-1000
This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_1000 dataset. It achieves the following results on the evaluation set:
- Loss: 0.2016
- Rewards/chosen: -0.2178
- Logps/chosen: -18.1049
- Logits/chosen: -3360374.5684
- Rewards/rejected: -7.9685
- Logps/rejected: -99.2175
- Logits/rejected: -5993503.6952
- Rewards/margins: 7.7507
- Kl: 0.0
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Logps/chosen | Logits/chosen | Rewards/rejected | Logps/rejected | Logits/rejected | Rewards/margins | |
---|---|---|---|---|---|---|---|---|---|---|---|
0.4996 | 0.4444 | 50 | 0.4996 | 0.0049 | -15.8775 | -5421844.2105 | 0.0013 | -19.5191 | -7437597.2571 | 0.0036 | 5.5406 |
0.4926 | 0.8889 | 100 | 0.4927 | 0.0680 | -15.2462 | -5315816.4211 | 0.0091 | -19.4418 | -7413740.4952 | 0.0590 | 4.7451 |
0.3935 | 1.3333 | 150 | 0.3965 | 0.2993 | -12.9332 | -4366973.9789 | -0.5726 | -25.2580 | -6947738.2095 | 0.8719 | 0.6684 |
0.288 | 1.7778 | 200 | 0.2868 | 0.4599 | -11.3269 | -3715966.9895 | -1.8666 | -38.1983 | -6637966.6286 | 2.3265 | 0.0 |
0.2304 | 2.2222 | 250 | 0.2456 | 0.2811 | -13.1157 | -3821936.5053 | -4.1256 | -60.7884 | -6486972.9524 | 4.4067 | 0.0 |
0.2265 | 2.6667 | 300 | 0.2277 | 0.2055 | -13.8714 | -3639481.9368 | -5.1055 | -70.5871 | -6323365.7905 | 5.3110 | 0.0 |
0.1787 | 3.1111 | 350 | 0.2252 | 0.0093 | -15.8332 | -3060385.6842 | -6.2024 | -81.5565 | -5682171.1238 | 6.2117 | 0.0 |
0.1818 | 3.5556 | 400 | 0.2285 | 0.0137 | -15.7897 | -2924462.4842 | -6.4299 | -83.8315 | -5623589.1810 | 6.4436 | 0.0 |
0.1921 | 4.0 | 450 | 0.2127 | -0.0080 | -16.0069 | -3297428.2105 | -6.8889 | -88.4215 | -5958031.8476 | 6.8809 | 0.0 |
0.1945 | 4.4444 | 500 | 0.2114 | -0.0668 | -16.5945 | -3297794.6947 | -7.2699 | -92.2313 | -5972243.5048 | 7.2031 | 0.0 |
0.2105 | 4.8889 | 550 | 0.2067 | -0.0350 | -16.2766 | -3147055.1579 | -7.0596 | -90.1288 | -5862926.6286 | 7.0246 | 0.0 |
0.1921 | 5.3333 | 600 | 0.2064 | -0.0570 | -16.4969 | -3241836.4632 | -7.2054 | -91.5865 | -5997722.8190 | 7.1484 | 0.0 |
0.1614 | 5.7778 | 650 | 0.2070 | -0.1499 | -17.4258 | -3228708.7158 | -7.6014 | -95.5464 | -5918022.0952 | 7.4515 | 0.0 |
0.1896 | 6.2222 | 700 | 0.2123 | -0.2047 | -17.9736 | -3418014.3158 | -7.6625 | -96.1570 | -6086026.9714 | 7.4577 | 0.0 |
0.1631 | 6.6667 | 750 | 0.2076 | -0.1804 | -17.7305 | -3385464.2526 | -7.6603 | -96.1349 | -6043348.1143 | 7.4798 | 0.0 |
0.1704 | 7.1111 | 800 | 0.2064 | -0.1567 | -17.4936 | -3383563.1158 | -7.6349 | -95.8816 | -6061806.3238 | 7.4782 | 0.0 |
0.1902 | 7.5556 | 850 | 0.2029 | -0.2018 | -17.9440 | -3373625.6 | -7.8793 | -98.3253 | -6032148.7238 | 7.6775 | 0.0 |
0.174 | 8.0 | 900 | 0.2016 | -0.2178 | -18.1049 | -3360374.5684 | -7.9685 | -99.2175 | -5993503.6952 | 7.7507 | 0.0 |
0.2268 | 8.4444 | 950 | 0.2036 | -0.2365 | -18.2911 | -3331174.4 | -8.0276 | -99.8082 | -5953203.8095 | 7.7911 | 0.0 |
0.1646 | 8.8889 | 1000 | 0.2038 | -0.2586 | -18.5126 | -3326715.9579 | -8.0877 | -100.4094 | -5970805.6381 | 7.8291 | 0.0 |
0.1964 | 9.3333 | 1050 | 0.2038 | -0.2629 | -18.5557 | -3347635.5368 | -8.0931 | -100.4632 | -5967138.1333 | 7.8302 | 0.0 |
0.1483 | 9.7778 | 1100 | 0.2076 | -0.2689 | -18.6153 | -3328483.0316 | -8.0719 | -100.2517 | -5965142.5524 | 7.8031 | 0.0 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 10
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Model tree for chchen/Llama-3.1-8B-Instruct-KTO-1000
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct