mistralit2_1000_STEPS_rate_1e5_03_Beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.8137
  • Rewards/chosen: -13.6944
  • Rewards/rejected: -12.2044
  • Rewards/accuracies: 0.3495
  • Rewards/margins: -1.4900
  • Logps/rejected: -69.2538
  • Logps/chosen: -69.0338
  • Logits/rejected: -5.2668
  • Logits/chosen: -5.2668

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
4.2261 0.1 50 2.5021 -2.9115 -1.7672 0.3516 -1.1443 -34.4631 -33.0910 -3.0003 -3.0001
4.0353 0.2 100 4.3015 -16.2009 -14.9302 0.3912 -1.2707 -78.3397 -77.3888 -1.4166 -1.4166
4.1344 0.29 150 3.8834 -13.4989 -12.1622 0.3846 -1.3367 -69.1129 -68.3820 -3.1652 -3.1652
6.0597 0.39 200 3.8687 -13.7714 -12.6321 0.3956 -1.1392 -70.6795 -69.2904 -3.3126 -3.3126
3.4133 0.49 250 3.7600 -12.9111 -11.6593 0.3736 -1.2517 -67.4368 -66.4227 -3.5276 -3.5276
3.8331 0.59 300 3.7138 -12.6732 -11.3367 0.3582 -1.3365 -66.3615 -65.6299 -4.3713 -4.3713
4.4899 0.68 350 3.6843 -12.5529 -11.2259 0.3736 -1.3270 -65.9920 -65.2288 -4.2730 -4.2730
3.2404 0.78 400 3.6913 -12.6760 -11.3481 0.3692 -1.3279 -66.3993 -65.6391 -4.4066 -4.4066
3.4317 0.88 450 3.7402 -12.8394 -11.5008 0.3714 -1.3386 -66.9084 -66.1840 -4.7568 -4.7568
5.1385 0.98 500 3.7270 -12.8543 -11.4815 0.3582 -1.3728 -66.8442 -66.2336 -4.8716 -4.8716
3.1946 1.07 550 3.7911 -13.4302 -11.9891 0.3626 -1.4411 -68.5361 -68.1532 -5.0836 -5.0836
3.0812 1.17 600 3.9012 -14.2400 -12.6930 0.3538 -1.5470 -70.8825 -70.8524 -5.8038 -5.8038
3.1908 1.27 650 3.8805 -14.0486 -12.5350 0.3429 -1.5136 -70.3556 -70.2144 -5.0640 -5.0640
3.5745 1.37 700 3.8088 -13.5700 -12.0845 0.3429 -1.4855 -68.8541 -68.6191 -5.0789 -5.0789
3.3361 1.46 750 3.7803 -13.3782 -11.9205 0.3604 -1.4577 -68.3074 -67.9799 -5.1590 -5.1590
3.0339 1.56 800 3.7887 -13.4369 -11.9712 0.3538 -1.4657 -68.4765 -68.1755 -5.1745 -5.1745
3.5519 1.66 850 3.8024 -13.5450 -12.0641 0.3473 -1.4809 -68.7860 -68.5357 -5.1629 -5.1629
3.2271 1.76 900 3.8138 -13.6946 -12.2043 0.3495 -1.4903 -69.2534 -69.0344 -5.2650 -5.2650
3.2287 1.86 950 3.8140 -13.6951 -12.2047 0.3495 -1.4904 -69.2548 -69.0363 -5.2679 -5.2679
4.9599 1.95 1000 3.8137 -13.6944 -12.2044 0.3495 -1.4900 -69.2538 -69.0338 -5.2668 -5.2668

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
4
Safetensors
Model size
7.24B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tsavage68/mistralit2_1000_STEPS_rate_1e5_03_Beta_DPO

Finetuned
(994)
this model