v1_1000_STEPS_1e6_rate_01_beta_DPO
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6759
- Rewards/chosen: -1.1871
- Rewards/rejected: -1.7068
- Rewards/accuracies: 0.5934
- Rewards/margins: 0.5197
- Logps/rejected: -33.9475
- Logps/chosen: -27.1237
- Logits/rejected: -3.3244
- Logits/chosen: -3.3248
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6937 | 0.05 | 50 | 0.6813 | -0.0431 | -0.0702 | 0.5385 | 0.0271 | -17.5820 | -15.6845 | -3.3850 | -3.3851 |
0.6379 | 0.1 | 100 | 0.6783 | -0.8278 | -0.9877 | 0.5297 | 0.1600 | -26.7570 | -23.5307 | -3.2482 | -3.2484 |
0.7855 | 0.15 | 150 | 0.7185 | -1.4527 | -1.6155 | 0.5231 | 0.1628 | -33.0342 | -29.7800 | -3.2664 | -3.2667 |
0.677 | 0.2 | 200 | 0.7468 | -1.8857 | -2.1602 | 0.5429 | 0.2745 | -38.4815 | -34.1103 | -3.1875 | -3.1878 |
0.6922 | 0.24 | 250 | 0.6741 | -1.6341 | -1.9210 | 0.5714 | 0.2869 | -36.0892 | -31.5939 | -3.1580 | -3.1584 |
0.7586 | 0.29 | 300 | 0.6942 | -1.6586 | -1.9019 | 0.5495 | 0.2433 | -35.8988 | -31.8388 | -3.3099 | -3.3102 |
0.7781 | 0.34 | 350 | 0.6667 | -0.9291 | -1.2556 | 0.5582 | 0.3265 | -29.4353 | -24.5438 | -3.3261 | -3.3264 |
0.9062 | 0.39 | 400 | 0.6779 | -0.9913 | -1.3228 | 0.5516 | 0.3315 | -30.1079 | -25.1664 | -3.3846 | -3.3849 |
0.6895 | 0.44 | 450 | 0.7001 | -1.7388 | -2.0490 | 0.5538 | 0.3102 | -37.3697 | -32.6413 | -3.3156 | -3.3158 |
0.6914 | 0.49 | 500 | 0.7140 | -1.6769 | -1.9965 | 0.5604 | 0.3196 | -36.8445 | -32.0224 | -3.3308 | -3.3312 |
0.6684 | 0.54 | 550 | 0.6648 | -1.0055 | -1.4434 | 0.5604 | 0.4379 | -31.3135 | -25.3085 | -3.3448 | -3.3453 |
0.6566 | 0.59 | 600 | 0.6873 | -1.4806 | -1.9669 | 0.5912 | 0.4862 | -36.5482 | -30.0593 | -3.3429 | -3.3433 |
0.6652 | 0.64 | 650 | 0.6811 | -1.3638 | -1.8487 | 0.5846 | 0.4849 | -35.3664 | -28.8914 | -3.3402 | -3.3407 |
0.8078 | 0.68 | 700 | 0.6813 | -1.3470 | -1.8420 | 0.5890 | 0.4950 | -35.2997 | -28.7235 | -3.3181 | -3.3185 |
0.7023 | 0.73 | 750 | 0.6787 | -1.3433 | -1.8475 | 0.6022 | 0.5042 | -35.3545 | -28.6860 | -3.3293 | -3.3297 |
0.5746 | 0.78 | 800 | 0.6761 | -1.1720 | -1.6876 | 0.5956 | 0.5157 | -33.7557 | -26.9727 | -3.3266 | -3.3270 |
0.6828 | 0.83 | 850 | 0.6756 | -1.1797 | -1.6986 | 0.5934 | 0.5189 | -33.8653 | -27.0500 | -3.3243 | -3.3247 |
0.6355 | 0.88 | 900 | 0.6760 | -1.1867 | -1.7058 | 0.5934 | 0.5192 | -33.9377 | -27.1196 | -3.3243 | -3.3248 |
0.557 | 0.93 | 950 | 0.6760 | -1.1867 | -1.7063 | 0.5912 | 0.5196 | -33.9424 | -27.1197 | -3.3243 | -3.3248 |
0.5707 | 0.98 | 1000 | 0.6759 | -1.1871 | -1.7068 | 0.5934 | 0.5197 | -33.9475 | -27.1237 | -3.3244 | -3.3248 |
Framework versions
- Transformers 4.39.1
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 2
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for tsavage68/v1_1000_STEPS_1e6_rate_01_beta_DPO
Base model
mistralai/Mistral-7B-v0.1
Finetuned
mistralai/Mistral-7B-Instruct-v0.1