Llama-3.1-8B-Instruct-KTO-600
This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_600 dataset. It achieves the following results on the evaluation set:
- Loss: 0.2017
- Rewards/chosen: 0.0412
- Logps/chosen: -18.3761
- Logits/chosen: -2496719.4921
- Rewards/rejected: -6.6216
- Logps/rejected: -86.0225
- Logits/rejected: -7772195.3684
- Rewards/margins: 6.6628
- Kl: 0.0
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Logps/chosen | Logits/chosen | Rewards/rejected | Logps/rejected | Logits/rejected | Rewards/margins | |
---|---|---|---|---|---|---|---|---|---|---|---|
0.4993 | 0.7407 | 50 | 0.4998 | 0.0053 | -18.7353 | -4714148.5714 | 0.0043 | -19.7637 | -7797900.9123 | 0.0010 | 2.5227 |
0.4763 | 1.4815 | 100 | 0.4762 | 0.1369 | -17.4186 | -4475767.3651 | -0.0490 | -20.2970 | -7752298.6667 | 0.1860 | 5.6644 |
0.3669 | 2.2222 | 150 | 0.3865 | 0.1420 | -17.3676 | -3437302.8571 | -0.9359 | -29.1656 | -7374456.1404 | 1.0779 | 0.0 |
0.2687 | 2.9630 | 200 | 0.2844 | 0.3564 | -15.2243 | -3008007.1111 | -2.3051 | -42.8578 | -7507831.0175 | 2.6615 | 0.1954 |
0.2398 | 3.7037 | 250 | 0.2238 | 0.4618 | -14.1696 | -2773128.1270 | -4.0572 | -60.3789 | -7716537.2632 | 4.5191 | 0.0 |
0.2508 | 4.4444 | 300 | 0.2089 | 0.3865 | -14.9233 | -2774151.1111 | -5.0725 | -70.5321 | -7890091.7895 | 5.4590 | 0.0 |
0.1947 | 5.1852 | 350 | 0.2057 | 0.2042 | -16.7464 | -2611237.0794 | -5.9252 | -79.0592 | -7821654.4561 | 6.1294 | 0.0 |
0.1666 | 5.9259 | 400 | 0.2027 | 0.1387 | -17.4006 | -2482929.2698 | -6.1703 | -81.5101 | -7752611.9298 | 6.3091 | 0.0 |
0.1956 | 6.6667 | 450 | 0.2023 | 0.1210 | -17.5785 | -2528993.0159 | -6.2460 | -82.2664 | -7765871.1579 | 6.3669 | 0.0 |
0.1888 | 7.4074 | 500 | 0.2026 | 0.0571 | -18.2172 | -2538207.2381 | -6.5054 | -84.8605 | -7796628.2105 | 6.5625 | 0.0 |
0.2411 | 8.1481 | 550 | 0.2024 | 0.0368 | -18.4202 | -2527997.9683 | -6.6091 | -85.8983 | -7806604.3509 | 6.6459 | 0.0 |
0.2231 | 8.8889 | 600 | 0.2018 | 0.0382 | -18.4056 | -2503114.1587 | -6.5431 | -85.2377 | -7783503.1579 | 6.5813 | 0.0 |
0.1966 | 9.6296 | 650 | 0.2017 | 0.0412 | -18.3761 | -2496719.4921 | -6.6216 | -86.0225 | -7772195.3684 | 6.6628 | 0.0 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 1
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no pipeline_tag.
Model tree for chchen/Llama-3.1-8B-Instruct-KTO-600
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct