v1_1000_STEPS_1e7_rate_03_beta_DPO
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6480
- Rewards/chosen: -0.1617
- Rewards/rejected: -0.2816
- Rewards/accuracies: 0.5912
- Rewards/margins: 0.1199
- Logps/rejected: -17.8183
- Logps/chosen: -15.7920
- Logits/rejected: -3.3428
- Logits/chosen: -3.3429
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6922 | 0.05 | 50 | 0.6921 | -0.0001 | -0.0024 | 0.5033 | 0.0023 | -16.8875 | -15.2533 | -3.3539 | -3.3540 |
0.6805 | 0.1 | 100 | 0.6859 | -0.0076 | -0.0233 | 0.5626 | 0.0156 | -16.9571 | -15.2784 | -3.3527 | -3.3527 |
0.684 | 0.15 | 150 | 0.6780 | -0.0207 | -0.0549 | 0.5758 | 0.0342 | -17.0624 | -15.3221 | -3.3514 | -3.3515 |
0.668 | 0.2 | 200 | 0.6712 | -0.0524 | -0.1041 | 0.5736 | 0.0517 | -17.2267 | -15.4277 | -3.3479 | -3.3480 |
0.6602 | 0.24 | 250 | 0.6656 | -0.0787 | -0.1439 | 0.5802 | 0.0651 | -17.3591 | -15.5155 | -3.3454 | -3.3455 |
0.6512 | 0.29 | 300 | 0.6625 | -0.1164 | -0.1922 | 0.5780 | 0.0758 | -17.5202 | -15.6409 | -3.3452 | -3.3453 |
0.6949 | 0.34 | 350 | 0.6586 | -0.1002 | -0.1858 | 0.5956 | 0.0855 | -17.4988 | -15.5872 | -3.3448 | -3.3449 |
0.6836 | 0.39 | 400 | 0.6558 | -0.0983 | -0.1934 | 0.5890 | 0.0952 | -17.5242 | -15.5806 | -3.3452 | -3.3453 |
0.5895 | 0.44 | 450 | 0.6530 | -0.1263 | -0.2307 | 0.5846 | 0.1044 | -17.6486 | -15.6741 | -3.3440 | -3.3441 |
0.6855 | 0.49 | 500 | 0.6504 | -0.1226 | -0.2329 | 0.5890 | 0.1103 | -17.6558 | -15.6618 | -3.3435 | -3.3436 |
0.5863 | 0.54 | 550 | 0.6497 | -0.1490 | -0.2631 | 0.5868 | 0.1142 | -17.7566 | -15.7496 | -3.3433 | -3.3434 |
0.6496 | 0.59 | 600 | 0.6496 | -0.1503 | -0.2653 | 0.5868 | 0.1150 | -17.7639 | -15.7542 | -3.3431 | -3.3432 |
0.6113 | 0.64 | 650 | 0.6478 | -0.1488 | -0.2683 | 0.5934 | 0.1195 | -17.7738 | -15.7490 | -3.3432 | -3.3433 |
0.6582 | 0.68 | 700 | 0.6482 | -0.1563 | -0.2757 | 0.5890 | 0.1194 | -17.7985 | -15.7741 | -3.3428 | -3.3429 |
0.6477 | 0.73 | 750 | 0.6476 | -0.1590 | -0.2798 | 0.5868 | 0.1208 | -17.8123 | -15.7831 | -3.3428 | -3.3430 |
0.6137 | 0.78 | 800 | 0.6477 | -0.1601 | -0.2804 | 0.5912 | 0.1203 | -17.8141 | -15.7867 | -3.3427 | -3.3429 |
0.6539 | 0.83 | 850 | 0.6475 | -0.1611 | -0.2818 | 0.5890 | 0.1207 | -17.8188 | -15.7899 | -3.3428 | -3.3429 |
0.6508 | 0.88 | 900 | 0.6477 | -0.1607 | -0.2816 | 0.5912 | 0.1209 | -17.8182 | -15.7887 | -3.3428 | -3.3430 |
0.6543 | 0.93 | 950 | 0.6482 | -0.1619 | -0.2813 | 0.5934 | 0.1194 | -17.8172 | -15.7927 | -3.3428 | -3.3429 |
0.6219 | 0.98 | 1000 | 0.6480 | -0.1617 | -0.2816 | 0.5912 | 0.1199 | -17.8183 | -15.7920 | -3.3428 | -3.3429 |
Framework versions
- Transformers 4.39.1
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 2
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for tsavage68/v1_1000_STEPS_1e7_rate_03_beta_DPO
Base model
mistralai/Mistral-7B-v0.1
Finetuned
mistralai/Mistral-7B-Instruct-v0.1