--- license: apache-2.0 base_model: mistralai/Mistral-7B-Instruct-v0.1 tags: - trl - dpo - generated_from_trainer model-index: - name: v1_1000_STEPS_1e7_rate_03_beta_DPO results: [] --- # v1_1000_STEPS_1e7_rate_03_beta_DPO This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.6480 - Rewards/chosen: -0.1617 - Rewards/rejected: -0.2816 - Rewards/accuracies: 0.5912 - Rewards/margins: 0.1199 - Logps/rejected: -17.8183 - Logps/chosen: -15.7920 - Logits/rejected: -3.3428 - Logits/chosen: -3.3429 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-07 - train_batch_size: 2 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - training_steps: 1000 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6922 | 0.05 | 50 | 0.6921 | -0.0001 | -0.0024 | 0.5033 | 0.0023 | -16.8875 | -15.2533 | -3.3539 | -3.3540 | | 0.6805 | 0.1 | 100 | 0.6859 | -0.0076 | -0.0233 | 0.5626 | 0.0156 | -16.9571 | -15.2784 | -3.3527 | -3.3527 | | 0.684 | 0.15 | 150 | 0.6780 | -0.0207 | -0.0549 | 0.5758 | 0.0342 | -17.0624 | -15.3221 | -3.3514 | -3.3515 | | 0.668 | 0.2 | 200 | 0.6712 | -0.0524 | -0.1041 | 0.5736 | 0.0517 | -17.2267 | -15.4277 | -3.3479 | -3.3480 | | 0.6602 | 0.24 | 250 | 0.6656 | -0.0787 | -0.1439 | 0.5802 | 0.0651 | -17.3591 | -15.5155 | -3.3454 | -3.3455 | | 0.6512 | 0.29 | 300 | 0.6625 | -0.1164 | -0.1922 | 0.5780 | 0.0758 | -17.5202 | -15.6409 | -3.3452 | -3.3453 | | 0.6949 | 0.34 | 350 | 0.6586 | -0.1002 | -0.1858 | 0.5956 | 0.0855 | -17.4988 | -15.5872 | -3.3448 | -3.3449 | | 0.6836 | 0.39 | 400 | 0.6558 | -0.0983 | -0.1934 | 0.5890 | 0.0952 | -17.5242 | -15.5806 | -3.3452 | -3.3453 | | 0.5895 | 0.44 | 450 | 0.6530 | -0.1263 | -0.2307 | 0.5846 | 0.1044 | -17.6486 | -15.6741 | -3.3440 | -3.3441 | | 0.6855 | 0.49 | 500 | 0.6504 | -0.1226 | -0.2329 | 0.5890 | 0.1103 | -17.6558 | -15.6618 | -3.3435 | -3.3436 | | 0.5863 | 0.54 | 550 | 0.6497 | -0.1490 | -0.2631 | 0.5868 | 0.1142 | -17.7566 | -15.7496 | -3.3433 | -3.3434 | | 0.6496 | 0.59 | 600 | 0.6496 | -0.1503 | -0.2653 | 0.5868 | 0.1150 | -17.7639 | -15.7542 | -3.3431 | -3.3432 | | 0.6113 | 0.64 | 650 | 0.6478 | -0.1488 | -0.2683 | 0.5934 | 0.1195 | -17.7738 | -15.7490 | -3.3432 | -3.3433 | | 0.6582 | 0.68 | 700 | 0.6482 | -0.1563 | -0.2757 | 0.5890 | 0.1194 | -17.7985 | -15.7741 | -3.3428 | -3.3429 | | 0.6477 | 0.73 | 750 | 0.6476 | -0.1590 | -0.2798 | 0.5868 | 0.1208 | -17.8123 | -15.7831 | -3.3428 | -3.3430 | | 0.6137 | 0.78 | 800 | 0.6477 | -0.1601 | -0.2804 | 0.5912 | 0.1203 | -17.8141 | -15.7867 | -3.3427 | -3.3429 | | 0.6539 | 0.83 | 850 | 0.6475 | -0.1611 | -0.2818 | 0.5890 | 0.1207 | -17.8188 | -15.7899 | -3.3428 | -3.3429 | | 0.6508 | 0.88 | 900 | 0.6477 | -0.1607 | -0.2816 | 0.5912 | 0.1209 | -17.8182 | -15.7887 | -3.3428 | -3.3430 | | 0.6543 | 0.93 | 950 | 0.6482 | -0.1619 | -0.2813 | 0.5934 | 0.1194 | -17.8172 | -15.7927 | -3.3428 | -3.3429 | | 0.6219 | 0.98 | 1000 | 0.6480 | -0.1617 | -0.2816 | 0.5912 | 0.1199 | -17.8183 | -15.7920 | -3.3428 | -3.3429 | ### Framework versions - Transformers 4.39.1 - Pytorch 2.0.0+cu117 - Datasets 2.18.0 - Tokenizers 0.15.2