mistralit2_1000_STEPS_rate_1e5_03_Beta_DPO
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.8137
- Rewards/chosen: -13.6944
- Rewards/rejected: -12.2044
- Rewards/accuracies: 0.3495
- Rewards/margins: -1.4900
- Logps/rejected: -69.2538
- Logps/chosen: -69.0338
- Logits/rejected: -5.2668
- Logits/chosen: -5.2668
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
4.2261 | 0.1 | 50 | 2.5021 | -2.9115 | -1.7672 | 0.3516 | -1.1443 | -34.4631 | -33.0910 | -3.0003 | -3.0001 |
4.0353 | 0.2 | 100 | 4.3015 | -16.2009 | -14.9302 | 0.3912 | -1.2707 | -78.3397 | -77.3888 | -1.4166 | -1.4166 |
4.1344 | 0.29 | 150 | 3.8834 | -13.4989 | -12.1622 | 0.3846 | -1.3367 | -69.1129 | -68.3820 | -3.1652 | -3.1652 |
6.0597 | 0.39 | 200 | 3.8687 | -13.7714 | -12.6321 | 0.3956 | -1.1392 | -70.6795 | -69.2904 | -3.3126 | -3.3126 |
3.4133 | 0.49 | 250 | 3.7600 | -12.9111 | -11.6593 | 0.3736 | -1.2517 | -67.4368 | -66.4227 | -3.5276 | -3.5276 |
3.8331 | 0.59 | 300 | 3.7138 | -12.6732 | -11.3367 | 0.3582 | -1.3365 | -66.3615 | -65.6299 | -4.3713 | -4.3713 |
4.4899 | 0.68 | 350 | 3.6843 | -12.5529 | -11.2259 | 0.3736 | -1.3270 | -65.9920 | -65.2288 | -4.2730 | -4.2730 |
3.2404 | 0.78 | 400 | 3.6913 | -12.6760 | -11.3481 | 0.3692 | -1.3279 | -66.3993 | -65.6391 | -4.4066 | -4.4066 |
3.4317 | 0.88 | 450 | 3.7402 | -12.8394 | -11.5008 | 0.3714 | -1.3386 | -66.9084 | -66.1840 | -4.7568 | -4.7568 |
5.1385 | 0.98 | 500 | 3.7270 | -12.8543 | -11.4815 | 0.3582 | -1.3728 | -66.8442 | -66.2336 | -4.8716 | -4.8716 |
3.1946 | 1.07 | 550 | 3.7911 | -13.4302 | -11.9891 | 0.3626 | -1.4411 | -68.5361 | -68.1532 | -5.0836 | -5.0836 |
3.0812 | 1.17 | 600 | 3.9012 | -14.2400 | -12.6930 | 0.3538 | -1.5470 | -70.8825 | -70.8524 | -5.8038 | -5.8038 |
3.1908 | 1.27 | 650 | 3.8805 | -14.0486 | -12.5350 | 0.3429 | -1.5136 | -70.3556 | -70.2144 | -5.0640 | -5.0640 |
3.5745 | 1.37 | 700 | 3.8088 | -13.5700 | -12.0845 | 0.3429 | -1.4855 | -68.8541 | -68.6191 | -5.0789 | -5.0789 |
3.3361 | 1.46 | 750 | 3.7803 | -13.3782 | -11.9205 | 0.3604 | -1.4577 | -68.3074 | -67.9799 | -5.1590 | -5.1590 |
3.0339 | 1.56 | 800 | 3.7887 | -13.4369 | -11.9712 | 0.3538 | -1.4657 | -68.4765 | -68.1755 | -5.1745 | -5.1745 |
3.5519 | 1.66 | 850 | 3.8024 | -13.5450 | -12.0641 | 0.3473 | -1.4809 | -68.7860 | -68.5357 | -5.1629 | -5.1629 |
3.2271 | 1.76 | 900 | 3.8138 | -13.6946 | -12.2043 | 0.3495 | -1.4903 | -69.2534 | -69.0344 | -5.2650 | -5.2650 |
3.2287 | 1.86 | 950 | 3.8140 | -13.6951 | -12.2047 | 0.3495 | -1.4904 | -69.2548 | -69.0363 | -5.2679 | -5.2679 |
4.9599 | 1.95 | 1000 | 3.8137 | -13.6944 | -12.2044 | 0.3495 | -1.4900 | -69.2538 | -69.0338 | -5.2668 | -5.2668 |
Framework versions
- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 4
Model tree for tsavage68/mistralit2_1000_STEPS_rate_1e5_03_Beta_DPO
Base model
mistralai/Mistral-7B-Instruct-v0.2