v1_2000_STEPS_5e6_rate_03_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0059
  • Rewards/chosen: -24.7548
  • Rewards/rejected: -24.1061
  • Rewards/accuracies: 0.3582
  • Rewards/margins: -0.6487
  • Logps/rejected: -97.2333
  • Logps/chosen: -97.7691
  • Logits/rejected: -5.2528
  • Logits/chosen: -5.2528

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 2000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.9868 0.05 50 1.1711 -2.9428 -2.9268 0.4330 -0.0159 -26.6357 -25.0622 -3.6433 -3.6433
2.3478 0.1 100 2.0400 -6.8764 -6.5428 0.3846 -0.3336 -38.6889 -38.1744 -3.1687 -3.1689
2.611 0.15 150 1.9184 -5.6378 -5.4005 0.4044 -0.2373 -34.8813 -34.0459 -2.7506 -2.7505
4.1618 0.2 200 3.2078 -17.2610 -17.1338 0.4549 -0.1272 -73.9923 -72.7897 -1.8104 -1.8104
2.7627 0.24 250 2.9158 -15.0561 -14.8894 0.4440 -0.1667 -66.5108 -65.4402 -3.3141 -3.3141
3.6661 0.29 300 2.9462 -17.0688 -16.7324 0.4286 -0.3364 -72.6541 -72.1490 -3.4818 -3.4818
2.9918 0.34 350 2.8967 -14.0241 -13.8259 0.4527 -0.1982 -62.9660 -62.0001 -3.5357 -3.5357
5.0079 0.39 400 2.6045 -13.8849 -13.6344 0.4264 -0.2504 -62.3277 -61.5359 -3.3164 -3.3164
5.0356 0.44 450 2.8214 -15.0823 -14.8094 0.4484 -0.2729 -66.2441 -65.5273 -4.8720 -4.8719
3.858 0.49 500 2.8497 -14.8747 -14.6263 0.4462 -0.2484 -65.6339 -64.8354 -4.3757 -4.3757
4.3217 0.54 550 2.6753 -14.4812 -14.1726 0.4374 -0.3085 -64.1217 -63.5237 -4.6084 -4.6084
2.2709 0.59 600 2.7610 -17.0678 -16.8582 0.4374 -0.2096 -73.0735 -72.1458 -3.4647 -3.4646
4.1629 0.64 650 2.5745 -15.9106 -15.5448 0.4242 -0.3658 -68.6954 -68.2883 -4.6729 -4.6729
3.8448 0.68 700 2.5174 -15.9576 -15.6284 0.4549 -0.3292 -68.9742 -68.4451 -4.4193 -4.4193
2.2076 0.73 750 2.5577 -15.9437 -15.5036 0.4352 -0.4401 -68.5581 -68.3986 -5.2628 -5.2628
1.7122 0.78 800 2.4622 -16.9908 -16.5388 0.4330 -0.4520 -72.0088 -71.8890 -4.6677 -4.6677
4.2836 0.83 850 2.4392 -21.5360 -21.1181 0.4242 -0.4179 -87.2732 -87.0397 -4.6942 -4.6942
2.0891 0.88 900 2.5920 -22.6793 -22.2203 0.4571 -0.4590 -90.9473 -90.8508 -4.8027 -4.8027
3.1818 0.93 950 2.3526 -23.8680 -23.5454 0.4527 -0.3226 -95.3641 -94.8129 -4.7170 -4.7169
2.9536 0.98 1000 2.3082 -23.0470 -22.5591 0.4220 -0.4879 -92.0765 -92.0763 -4.4404 -4.4404
1.7844 1.03 1050 2.1483 -21.5286 -20.9353 0.4088 -0.5933 -86.6637 -87.0149 -4.7032 -4.7031
1.7756 1.07 1100 2.2115 -23.2036 -22.5728 0.4000 -0.6308 -92.1223 -92.5985 -5.4507 -5.4507
1.5056 1.12 1150 2.2646 -19.3579 -18.6576 0.3846 -0.7004 -79.0715 -79.7795 -5.2285 -5.2285
1.3908 1.17 1200 2.2503 -22.9644 -22.1188 0.3824 -0.8456 -90.6089 -91.8011 -5.1449 -5.1449
1.9094 1.22 1250 2.2255 -24.9046 -24.0560 0.3890 -0.8486 -97.0663 -98.2684 -5.0663 -5.0663
1.6242 1.27 1300 2.3035 -22.9644 -22.2812 0.4022 -0.6832 -91.1502 -91.8012 -4.7409 -4.7408
1.7631 1.32 1350 2.2782 -24.2942 -23.4381 0.3846 -0.8560 -95.0067 -96.2336 -4.8726 -4.8725
1.821 1.37 1400 2.1303 -23.8856 -23.1654 0.3912 -0.7202 -94.0977 -94.8717 -5.1322 -5.1321
1.5613 1.42 1450 2.1094 -25.0650 -24.4124 0.3824 -0.6526 -98.2543 -98.8031 -5.2516 -5.2516
1.3106 1.47 1500 2.0269 -24.0518 -23.4855 0.3802 -0.5663 -95.1646 -95.4258 -5.2393 -5.2393
1.1946 1.51 1550 2.0830 -25.1070 -24.4242 0.3560 -0.6828 -98.2934 -98.9430 -5.2559 -5.2559
1.7872 1.56 1600 2.0496 -24.8926 -24.1890 0.3692 -0.7035 -97.5097 -98.2283 -5.2683 -5.2683
1.8887 1.61 1650 2.0065 -24.1169 -23.5004 0.3626 -0.6165 -95.2141 -95.6428 -5.2470 -5.2469
1.8434 1.66 1700 2.0105 -24.5153 -23.8551 0.3626 -0.6602 -96.3966 -96.9706 -5.2365 -5.2364
1.3652 1.71 1750 2.0138 -24.6797 -24.0077 0.3648 -0.6720 -96.9052 -97.5188 -5.2445 -5.2444
1.5787 1.76 1800 2.0064 -24.7465 -24.0922 0.3582 -0.6543 -97.1869 -97.7414 -5.2543 -5.2543
1.8425 1.81 1850 2.0064 -24.7549 -24.1066 0.3604 -0.6483 -97.2348 -97.7693 -5.2532 -5.2531
1.3414 1.86 1900 2.0058 -24.7571 -24.1089 0.3582 -0.6482 -97.2425 -97.7766 -5.2532 -5.2532
1.7149 1.91 1950 2.0055 -24.7535 -24.1060 0.3582 -0.6475 -97.2328 -97.7645 -5.2528 -5.2527
2.2753 1.95 2000 2.0059 -24.7548 -24.1061 0.3582 -0.6487 -97.2333 -97.7691 -5.2528 -5.2528

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
4
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tsavage68/v1_2000_STEPS_5e6_rate_03_beta_DPO

Finetuned
(252)
this model