mistralit2_1000_STEPS_1e8_rate_03_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6902
  • Rewards/chosen: -0.0213
  • Rewards/rejected: -0.0282
  • Rewards/accuracies: 0.4945
  • Rewards/margins: 0.0070
  • Logps/rejected: -28.6665
  • Logps/chosen: -23.4567
  • Logits/rejected: -2.8649
  • Logits/chosen: -2.8651

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6911 0.1 50 0.6909 0.0027 -0.0025 0.4967 0.0052 -28.5807 -23.3768 -2.8653 -2.8655
0.6916 0.2 100 0.6928 -0.0010 -0.0023 0.4571 0.0014 -28.5802 -23.3891 -2.8653 -2.8655
0.6924 0.29 150 0.6922 -0.0091 -0.0117 0.4681 0.0026 -28.6115 -23.4162 -2.8651 -2.8654
0.6941 0.39 200 0.6914 -0.0066 -0.0109 0.4879 0.0043 -28.6088 -23.4078 -2.8652 -2.8654
0.6942 0.49 250 0.6911 -0.0070 -0.0120 0.4791 0.0050 -28.6123 -23.4090 -2.8649 -2.8652
0.6909 0.59 300 0.6921 -0.0151 -0.0181 0.4593 0.0030 -28.6327 -23.4362 -2.8650 -2.8653
0.696 0.68 350 0.6903 -0.0140 -0.0207 0.5121 0.0067 -28.6414 -23.4326 -2.8651 -2.8653
0.6907 0.78 400 0.6904 -0.0153 -0.0217 0.4945 0.0064 -28.6448 -23.4369 -2.8649 -2.8652
0.6895 0.88 450 0.6898 -0.0157 -0.0232 0.4945 0.0075 -28.6497 -23.4380 -2.8649 -2.8652
0.6902 0.98 500 0.6892 -0.0192 -0.0282 0.5165 0.0090 -28.6665 -23.4500 -2.8650 -2.8652
0.6923 1.07 550 0.6893 -0.0196 -0.0282 0.5385 0.0086 -28.6663 -23.4511 -2.8649 -2.8652
0.6957 1.17 600 0.6897 -0.0210 -0.0288 0.5011 0.0078 -28.6684 -23.4560 -2.8649 -2.8652
0.6885 1.27 650 0.6897 -0.0173 -0.0251 0.5143 0.0078 -28.6560 -23.4436 -2.8650 -2.8653
0.6912 1.37 700 0.6906 -0.0207 -0.0268 0.4967 0.0061 -28.6617 -23.4548 -2.8650 -2.8652
0.6874 1.46 750 0.6903 -0.0216 -0.0282 0.4923 0.0065 -28.6663 -23.4580 -2.8650 -2.8652
0.6896 1.56 800 0.6877 -0.0180 -0.0298 0.5451 0.0119 -28.6719 -23.4457 -2.8649 -2.8651
0.6904 1.66 850 0.6905 -0.0217 -0.0279 0.4791 0.0062 -28.6655 -23.4582 -2.8649 -2.8651
0.6913 1.76 900 0.6902 -0.0213 -0.0282 0.4945 0.0070 -28.6665 -23.4567 -2.8649 -2.8651
0.6977 1.86 950 0.6902 -0.0213 -0.0282 0.4945 0.0070 -28.6665 -23.4567 -2.8649 -2.8651
0.6892 1.95 1000 0.6902 -0.0213 -0.0282 0.4945 0.0070 -28.6665 -23.4567 -2.8649 -2.8651

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
4
Safetensors
Model size
7.24B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tsavage68/mistralit2_1000_STEPS_1e8_rate_03_beta_DPO

Finetuned
(994)
this model