v1_2000_STEPS_5e6_rate_03_beta_DPO
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.0059
- Rewards/chosen: -24.7548
- Rewards/rejected: -24.1061
- Rewards/accuracies: 0.3582
- Rewards/margins: -0.6487
- Logps/rejected: -97.2333
- Logps/chosen: -97.7691
- Logits/rejected: -5.2528
- Logits/chosen: -5.2528
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 2000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.9868 | 0.05 | 50 | 1.1711 | -2.9428 | -2.9268 | 0.4330 | -0.0159 | -26.6357 | -25.0622 | -3.6433 | -3.6433 |
2.3478 | 0.1 | 100 | 2.0400 | -6.8764 | -6.5428 | 0.3846 | -0.3336 | -38.6889 | -38.1744 | -3.1687 | -3.1689 |
2.611 | 0.15 | 150 | 1.9184 | -5.6378 | -5.4005 | 0.4044 | -0.2373 | -34.8813 | -34.0459 | -2.7506 | -2.7505 |
4.1618 | 0.2 | 200 | 3.2078 | -17.2610 | -17.1338 | 0.4549 | -0.1272 | -73.9923 | -72.7897 | -1.8104 | -1.8104 |
2.7627 | 0.24 | 250 | 2.9158 | -15.0561 | -14.8894 | 0.4440 | -0.1667 | -66.5108 | -65.4402 | -3.3141 | -3.3141 |
3.6661 | 0.29 | 300 | 2.9462 | -17.0688 | -16.7324 | 0.4286 | -0.3364 | -72.6541 | -72.1490 | -3.4818 | -3.4818 |
2.9918 | 0.34 | 350 | 2.8967 | -14.0241 | -13.8259 | 0.4527 | -0.1982 | -62.9660 | -62.0001 | -3.5357 | -3.5357 |
5.0079 | 0.39 | 400 | 2.6045 | -13.8849 | -13.6344 | 0.4264 | -0.2504 | -62.3277 | -61.5359 | -3.3164 | -3.3164 |
5.0356 | 0.44 | 450 | 2.8214 | -15.0823 | -14.8094 | 0.4484 | -0.2729 | -66.2441 | -65.5273 | -4.8720 | -4.8719 |
3.858 | 0.49 | 500 | 2.8497 | -14.8747 | -14.6263 | 0.4462 | -0.2484 | -65.6339 | -64.8354 | -4.3757 | -4.3757 |
4.3217 | 0.54 | 550 | 2.6753 | -14.4812 | -14.1726 | 0.4374 | -0.3085 | -64.1217 | -63.5237 | -4.6084 | -4.6084 |
2.2709 | 0.59 | 600 | 2.7610 | -17.0678 | -16.8582 | 0.4374 | -0.2096 | -73.0735 | -72.1458 | -3.4647 | -3.4646 |
4.1629 | 0.64 | 650 | 2.5745 | -15.9106 | -15.5448 | 0.4242 | -0.3658 | -68.6954 | -68.2883 | -4.6729 | -4.6729 |
3.8448 | 0.68 | 700 | 2.5174 | -15.9576 | -15.6284 | 0.4549 | -0.3292 | -68.9742 | -68.4451 | -4.4193 | -4.4193 |
2.2076 | 0.73 | 750 | 2.5577 | -15.9437 | -15.5036 | 0.4352 | -0.4401 | -68.5581 | -68.3986 | -5.2628 | -5.2628 |
1.7122 | 0.78 | 800 | 2.4622 | -16.9908 | -16.5388 | 0.4330 | -0.4520 | -72.0088 | -71.8890 | -4.6677 | -4.6677 |
4.2836 | 0.83 | 850 | 2.4392 | -21.5360 | -21.1181 | 0.4242 | -0.4179 | -87.2732 | -87.0397 | -4.6942 | -4.6942 |
2.0891 | 0.88 | 900 | 2.5920 | -22.6793 | -22.2203 | 0.4571 | -0.4590 | -90.9473 | -90.8508 | -4.8027 | -4.8027 |
3.1818 | 0.93 | 950 | 2.3526 | -23.8680 | -23.5454 | 0.4527 | -0.3226 | -95.3641 | -94.8129 | -4.7170 | -4.7169 |
2.9536 | 0.98 | 1000 | 2.3082 | -23.0470 | -22.5591 | 0.4220 | -0.4879 | -92.0765 | -92.0763 | -4.4404 | -4.4404 |
1.7844 | 1.03 | 1050 | 2.1483 | -21.5286 | -20.9353 | 0.4088 | -0.5933 | -86.6637 | -87.0149 | -4.7032 | -4.7031 |
1.7756 | 1.07 | 1100 | 2.2115 | -23.2036 | -22.5728 | 0.4000 | -0.6308 | -92.1223 | -92.5985 | -5.4507 | -5.4507 |
1.5056 | 1.12 | 1150 | 2.2646 | -19.3579 | -18.6576 | 0.3846 | -0.7004 | -79.0715 | -79.7795 | -5.2285 | -5.2285 |
1.3908 | 1.17 | 1200 | 2.2503 | -22.9644 | -22.1188 | 0.3824 | -0.8456 | -90.6089 | -91.8011 | -5.1449 | -5.1449 |
1.9094 | 1.22 | 1250 | 2.2255 | -24.9046 | -24.0560 | 0.3890 | -0.8486 | -97.0663 | -98.2684 | -5.0663 | -5.0663 |
1.6242 | 1.27 | 1300 | 2.3035 | -22.9644 | -22.2812 | 0.4022 | -0.6832 | -91.1502 | -91.8012 | -4.7409 | -4.7408 |
1.7631 | 1.32 | 1350 | 2.2782 | -24.2942 | -23.4381 | 0.3846 | -0.8560 | -95.0067 | -96.2336 | -4.8726 | -4.8725 |
1.821 | 1.37 | 1400 | 2.1303 | -23.8856 | -23.1654 | 0.3912 | -0.7202 | -94.0977 | -94.8717 | -5.1322 | -5.1321 |
1.5613 | 1.42 | 1450 | 2.1094 | -25.0650 | -24.4124 | 0.3824 | -0.6526 | -98.2543 | -98.8031 | -5.2516 | -5.2516 |
1.3106 | 1.47 | 1500 | 2.0269 | -24.0518 | -23.4855 | 0.3802 | -0.5663 | -95.1646 | -95.4258 | -5.2393 | -5.2393 |
1.1946 | 1.51 | 1550 | 2.0830 | -25.1070 | -24.4242 | 0.3560 | -0.6828 | -98.2934 | -98.9430 | -5.2559 | -5.2559 |
1.7872 | 1.56 | 1600 | 2.0496 | -24.8926 | -24.1890 | 0.3692 | -0.7035 | -97.5097 | -98.2283 | -5.2683 | -5.2683 |
1.8887 | 1.61 | 1650 | 2.0065 | -24.1169 | -23.5004 | 0.3626 | -0.6165 | -95.2141 | -95.6428 | -5.2470 | -5.2469 |
1.8434 | 1.66 | 1700 | 2.0105 | -24.5153 | -23.8551 | 0.3626 | -0.6602 | -96.3966 | -96.9706 | -5.2365 | -5.2364 |
1.3652 | 1.71 | 1750 | 2.0138 | -24.6797 | -24.0077 | 0.3648 | -0.6720 | -96.9052 | -97.5188 | -5.2445 | -5.2444 |
1.5787 | 1.76 | 1800 | 2.0064 | -24.7465 | -24.0922 | 0.3582 | -0.6543 | -97.1869 | -97.7414 | -5.2543 | -5.2543 |
1.8425 | 1.81 | 1850 | 2.0064 | -24.7549 | -24.1066 | 0.3604 | -0.6483 | -97.2348 | -97.7693 | -5.2532 | -5.2531 |
1.3414 | 1.86 | 1900 | 2.0058 | -24.7571 | -24.1089 | 0.3582 | -0.6482 | -97.2425 | -97.7766 | -5.2532 | -5.2532 |
1.7149 | 1.91 | 1950 | 2.0055 | -24.7535 | -24.1060 | 0.3582 | -0.6475 | -97.2328 | -97.7645 | -5.2528 | -5.2527 |
2.2753 | 1.95 | 2000 | 2.0059 | -24.7548 | -24.1061 | 0.3582 | -0.6487 | -97.2333 | -97.7691 | -5.2528 | -5.2528 |
Framework versions
- Transformers 4.39.1
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for tsavage68/v1_2000_STEPS_5e6_rate_03_beta_DPO
Base model
mistralai/Mistral-7B-v0.1
Finetuned
mistralai/Mistral-7B-Instruct-v0.1