v1_1000_STEPS_1e6_rate_03_beta_DPO2

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8487
  • Rewards/chosen: -2.2151
  • Rewards/rejected: -3.0240
  • Rewards/accuracies: 0.5758
  • Rewards/margins: 0.8089
  • Logps/rejected: -26.9594
  • Logps/chosen: -22.6366
  • Logits/rejected: -3.2869
  • Logits/chosen: -3.2870

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7016 0.05 50 0.6686 -0.1010 -0.1780 0.5516 0.0771 -17.4730 -15.5896 -3.3830 -3.3830
0.6727 0.1 100 0.7732 -1.2186 -1.6229 0.5275 0.4043 -22.2891 -19.3150 -3.3350 -3.3352
1.2098 0.15 150 0.9205 -1.6685 -2.0242 0.5209 0.3558 -23.6270 -20.8147 -3.3998 -3.4000
0.8607 0.2 200 0.9312 -1.7362 -2.0915 0.5099 0.3553 -23.8513 -21.0405 -3.3324 -3.3326
0.896 0.24 250 0.9765 -1.8658 -2.0921 0.5011 0.2263 -23.8533 -21.4723 -3.2214 -3.2215
0.9783 0.29 300 0.9234 -1.9658 -2.3835 0.5165 0.4177 -24.8244 -21.8057 -3.3158 -3.3160
1.0592 0.34 350 0.9509 -3.1300 -3.4037 0.5033 0.2738 -28.2253 -25.6863 -3.2697 -3.2698
1.0391 0.39 400 0.9067 -2.4562 -2.8182 0.5231 0.3619 -26.2735 -23.4405 -3.3616 -3.3617
0.9409 0.44 450 0.9081 -2.8095 -3.1865 0.5231 0.3771 -27.5014 -24.6179 -3.3324 -3.3325
0.8139 0.49 500 0.9131 -2.8071 -3.2564 0.5560 0.4493 -27.7343 -24.6100 -3.3362 -3.3363
0.8732 0.54 550 0.8745 -2.3409 -3.0357 0.5516 0.6948 -26.9986 -23.0562 -3.3124 -3.3125
0.8179 0.59 600 0.8632 -2.1460 -2.9478 0.5692 0.8018 -26.7055 -22.4063 -3.3039 -3.3040
0.825 0.64 650 0.8769 -1.9605 -2.7326 0.5626 0.7721 -25.9882 -21.7879 -3.3006 -3.3007
0.7539 0.68 700 0.8600 -2.1758 -2.9531 0.5714 0.7773 -26.7232 -22.5059 -3.2794 -3.2795
0.7835 0.73 750 0.8551 -2.2525 -3.0394 0.5692 0.7868 -27.0107 -22.7614 -3.2905 -3.2906
0.925 0.78 800 0.8479 -2.2131 -3.0235 0.5736 0.8105 -26.9579 -22.6299 -3.2902 -3.2903
1.0166 0.83 850 0.8493 -2.2090 -3.0157 0.5780 0.8067 -26.9319 -22.6164 -3.2872 -3.2873
1.0711 0.88 900 0.8480 -2.2126 -3.0221 0.5758 0.8095 -26.9532 -22.6283 -3.2869 -3.2870
0.9928 0.93 950 0.8487 -2.2161 -3.0255 0.5802 0.8094 -26.9646 -22.6400 -3.2869 -3.2870
0.6707 0.98 1000 0.8487 -2.2151 -3.0240 0.5758 0.8089 -26.9594 -22.6366 -3.2869 -3.2870

Framework versions

  • Transformers 4.39.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
2
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tsavage68/v1_1000_STEPS_1e6_rate_03_beta_DPO2

Finetuned
(250)
this model