Llama-3.1-8B-Instruct-KTO-600

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_600 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2017
  • Rewards/chosen: 0.0412
  • Logps/chosen: -18.3761
  • Logits/chosen: -2496719.4921
  • Rewards/rejected: -6.6216
  • Logps/rejected: -86.0225
  • Logits/rejected: -7772195.3684
  • Rewards/margins: 6.6628
  • Kl: 0.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Logps/chosen Logits/chosen Rewards/rejected Logps/rejected Logits/rejected Rewards/margins
0.4993 0.7407 50 0.4998 0.0053 -18.7353 -4714148.5714 0.0043 -19.7637 -7797900.9123 0.0010 2.5227
0.4763 1.4815 100 0.4762 0.1369 -17.4186 -4475767.3651 -0.0490 -20.2970 -7752298.6667 0.1860 5.6644
0.3669 2.2222 150 0.3865 0.1420 -17.3676 -3437302.8571 -0.9359 -29.1656 -7374456.1404 1.0779 0.0
0.2687 2.9630 200 0.2844 0.3564 -15.2243 -3008007.1111 -2.3051 -42.8578 -7507831.0175 2.6615 0.1954
0.2398 3.7037 250 0.2238 0.4618 -14.1696 -2773128.1270 -4.0572 -60.3789 -7716537.2632 4.5191 0.0
0.2508 4.4444 300 0.2089 0.3865 -14.9233 -2774151.1111 -5.0725 -70.5321 -7890091.7895 5.4590 0.0
0.1947 5.1852 350 0.2057 0.2042 -16.7464 -2611237.0794 -5.9252 -79.0592 -7821654.4561 6.1294 0.0
0.1666 5.9259 400 0.2027 0.1387 -17.4006 -2482929.2698 -6.1703 -81.5101 -7752611.9298 6.3091 0.0
0.1956 6.6667 450 0.2023 0.1210 -17.5785 -2528993.0159 -6.2460 -82.2664 -7765871.1579 6.3669 0.0
0.1888 7.4074 500 0.2026 0.0571 -18.2172 -2538207.2381 -6.5054 -84.8605 -7796628.2105 6.5625 0.0
0.2411 8.1481 550 0.2024 0.0368 -18.4202 -2527997.9683 -6.6091 -85.8983 -7806604.3509 6.6459 0.0
0.2231 8.8889 600 0.2018 0.0382 -18.4056 -2503114.1587 -6.5431 -85.2377 -7783503.1579 6.5813 0.0
0.1966 9.6296 650 0.2017 0.0412 -18.3761 -2496719.4921 -6.6216 -86.0225 -7772195.3684 6.6628 0.0

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
1
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for chchen/Llama-3.1-8B-Instruct-KTO-600

Adapter
(811)
this model