v1_2000_STEPS_5e6_rate_01_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0579
  • Rewards/chosen: -6.2250
  • Rewards/rejected: -6.0633
  • Rewards/accuracies: 0.4000
  • Rewards/margins: -0.1616
  • Logps/rejected: -77.5130
  • Logps/chosen: -77.5026
  • Logits/rejected: -4.8045
  • Logits/chosen: -4.8043

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 2000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.2927 0.1 100 1.2512 -5.1921 -5.1652 0.4418 -0.0269 -68.5312 -67.1737 -2.1461 -2.1461
1.7128 0.2 200 1.4813 -7.0324 -6.9155 0.4308 -0.1169 -86.0349 -85.5773 -3.9413 -3.9413
1.5461 0.29 300 1.2731 -5.4304 -5.3353 0.4330 -0.0951 -70.2329 -69.5573 -3.1316 -3.1316
1.9939 0.39 400 1.2127 -5.0685 -4.9672 0.4505 -0.1013 -66.5519 -65.9385 -3.9101 -3.9101
1.5849 0.49 500 1.2395 -5.1346 -5.0482 0.4396 -0.0864 -67.3612 -66.5990 -3.6011 -3.6011
1.0981 0.59 600 1.2043 -4.9745 -4.8822 0.4440 -0.0923 -65.7019 -64.9985 -3.7103 -3.7103
1.9697 0.68 700 1.2507 -5.1232 -5.1033 0.4681 -0.0199 -67.9127 -66.4848 -3.6280 -3.6281
0.8747 0.78 800 1.1611 -4.8863 -4.7797 0.4505 -0.1065 -64.6770 -64.1157 -4.1453 -4.1453
1.004 0.88 900 1.1843 -5.6291 -5.5168 0.4527 -0.1123 -72.0471 -71.5438 -4.7506 -4.7510
1.1444 0.98 1000 1.2340 -6.0675 -6.1117 0.4571 0.0442 -77.9961 -75.9278 -4.5307 -4.5307
0.9495 1.07 1100 1.1048 -6.4880 -6.3179 0.3956 -0.1701 -80.0584 -80.1334 -4.5289 -4.5289
0.8455 1.17 1200 1.2109 -8.1849 -7.8918 0.3890 -0.2931 -95.7973 -97.1021 -5.1408 -5.1408
0.7447 1.27 1300 1.2187 -7.3352 -7.0426 0.3890 -0.2926 -87.3051 -88.6049 -4.4742 -4.4743
1.1554 1.37 1400 1.0728 -6.1506 -5.9622 0.3956 -0.1884 -76.5017 -76.7589 -5.0027 -5.0028
0.7376 1.47 1500 1.0798 -6.2916 -6.1208 0.4066 -0.1707 -78.0880 -78.1689 -4.9008 -4.9008
0.7962 1.56 1600 1.0666 -6.4071 -6.2400 0.4022 -0.1671 -79.2797 -79.3242 -4.8474 -4.8473
0.89 1.66 1700 1.0615 -6.3087 -6.1398 0.3912 -0.1690 -78.2774 -78.3405 -4.8210 -4.8209
0.902 1.76 1800 1.0582 -6.2322 -6.0708 0.4022 -0.1614 -77.5875 -77.5753 -4.7973 -4.7972
0.7122 1.86 1900 1.0578 -6.2254 -6.0638 0.4000 -0.1616 -77.5177 -77.5069 -4.8049 -4.8048
0.9455 1.95 2000 1.0579 -6.2250 -6.0633 0.4000 -0.1616 -77.5130 -77.5026 -4.8045 -4.8043

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
3
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tsavage68/v1_2000_STEPS_5e6_rate_01_beta_DPO

Finetuned
(252)
this model