kto_trained_1

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the lightblue_kto_data dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3031
  • Rewards/chosen: 1.5421
  • Logps/chosen: -343.9051
  • Logits/chosen: -69679219.2
  • Rewards/rejected: -7.3046
  • Logps/rejected: -233.7684
  • Logits/rejected: -34451756.1379
  • Rewards/margins: 8.8467
  • Kl: 1080.3173

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Logps/chosen Logits/chosen Rewards/rejected Logps/rejected Logits/rejected Rewards/margins
0.2623 0.0997 36 0.3340 1.3847 -345.4796 -55713169.0667 -3.6384 -197.1070 -40055004.6897 5.0231 890.2159
0.3222 0.1995 72 0.3273 1.5219 -344.1068 -61469499.7333 -4.9277 -209.9999 -32503238.6207 6.4496 1189.5447
0.3798 0.2992 108 0.3185 1.5573 -343.7531 -63003302.4 -5.7081 -217.8038 -31597484.1379 7.2654 955.4995
0.3755 0.3990 144 0.3016 0.8908 -350.4181 -63924428.8 -6.8986 -229.7092 -27711788.1379 7.7895 705.8951
0.3454 0.4987 180 0.3053 1.4481 -344.8449 -67193476.2667 -6.5311 -226.0336 -37107747.3103 7.9792 836.6326
0.2633 0.5984 216 0.3085 1.5864 -343.4627 -68801646.9333 -6.4654 -225.3766 -37986458.4828 8.0517 974.3778
0.2519 0.6982 252 0.3109 1.5635 -343.6908 -69407142.4 -6.4303 -225.0262 -34758311.7241 7.9939 1106.7635
0.2959 0.7979 288 0.3033 1.6631 -342.6956 -69444923.7333 -7.0061 -230.7837 -36029797.5172 8.6691 1082.5067
0.2921 0.8977 324 0.3022 1.4322 -345.0042 -69711099.7333 -7.5841 -236.5635 -35742644.9655 9.0163 1047.6223
0.3122 0.9974 360 0.3031 1.5421 -343.9051 -69679219.2 -7.3046 -233.7684 -34451756.1379 8.8467 1080.3173

Framework versions

  • Transformers 4.46.1
  • Pytorch 2.4.0+cu121
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
3
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for lightblue/qwen2.5-7B-instruct-kto

Base model

Qwen/Qwen2.5-7B
Finetuned
(244)
this model