v1_1000_STEPS_1e7_rate_03_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6480
  • Rewards/chosen: -0.1617
  • Rewards/rejected: -0.2816
  • Rewards/accuracies: 0.5912
  • Rewards/margins: 0.1199
  • Logps/rejected: -17.8183
  • Logps/chosen: -15.7920
  • Logits/rejected: -3.3428
  • Logits/chosen: -3.3429

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6922 0.05 50 0.6921 -0.0001 -0.0024 0.5033 0.0023 -16.8875 -15.2533 -3.3539 -3.3540
0.6805 0.1 100 0.6859 -0.0076 -0.0233 0.5626 0.0156 -16.9571 -15.2784 -3.3527 -3.3527
0.684 0.15 150 0.6780 -0.0207 -0.0549 0.5758 0.0342 -17.0624 -15.3221 -3.3514 -3.3515
0.668 0.2 200 0.6712 -0.0524 -0.1041 0.5736 0.0517 -17.2267 -15.4277 -3.3479 -3.3480
0.6602 0.24 250 0.6656 -0.0787 -0.1439 0.5802 0.0651 -17.3591 -15.5155 -3.3454 -3.3455
0.6512 0.29 300 0.6625 -0.1164 -0.1922 0.5780 0.0758 -17.5202 -15.6409 -3.3452 -3.3453
0.6949 0.34 350 0.6586 -0.1002 -0.1858 0.5956 0.0855 -17.4988 -15.5872 -3.3448 -3.3449
0.6836 0.39 400 0.6558 -0.0983 -0.1934 0.5890 0.0952 -17.5242 -15.5806 -3.3452 -3.3453
0.5895 0.44 450 0.6530 -0.1263 -0.2307 0.5846 0.1044 -17.6486 -15.6741 -3.3440 -3.3441
0.6855 0.49 500 0.6504 -0.1226 -0.2329 0.5890 0.1103 -17.6558 -15.6618 -3.3435 -3.3436
0.5863 0.54 550 0.6497 -0.1490 -0.2631 0.5868 0.1142 -17.7566 -15.7496 -3.3433 -3.3434
0.6496 0.59 600 0.6496 -0.1503 -0.2653 0.5868 0.1150 -17.7639 -15.7542 -3.3431 -3.3432
0.6113 0.64 650 0.6478 -0.1488 -0.2683 0.5934 0.1195 -17.7738 -15.7490 -3.3432 -3.3433
0.6582 0.68 700 0.6482 -0.1563 -0.2757 0.5890 0.1194 -17.7985 -15.7741 -3.3428 -3.3429
0.6477 0.73 750 0.6476 -0.1590 -0.2798 0.5868 0.1208 -17.8123 -15.7831 -3.3428 -3.3430
0.6137 0.78 800 0.6477 -0.1601 -0.2804 0.5912 0.1203 -17.8141 -15.7867 -3.3427 -3.3429
0.6539 0.83 850 0.6475 -0.1611 -0.2818 0.5890 0.1207 -17.8188 -15.7899 -3.3428 -3.3429
0.6508 0.88 900 0.6477 -0.1607 -0.2816 0.5912 0.1209 -17.8182 -15.7887 -3.3428 -3.3430
0.6543 0.93 950 0.6482 -0.1619 -0.2813 0.5934 0.1194 -17.8172 -15.7927 -3.3428 -3.3429
0.6219 0.98 1000 0.6480 -0.1617 -0.2816 0.5912 0.1199 -17.8183 -15.7920 -3.3428 -3.3429

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
2
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tsavage68/v1_1000_STEPS_1e7_rate_03_beta_DPO

Finetuned
(250)
this model