v1_1000_STEPS_1e6_rate_01_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6759
  • Rewards/chosen: -1.1871
  • Rewards/rejected: -1.7068
  • Rewards/accuracies: 0.5934
  • Rewards/margins: 0.5197
  • Logps/rejected: -33.9475
  • Logps/chosen: -27.1237
  • Logits/rejected: -3.3244
  • Logits/chosen: -3.3248

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6937 0.05 50 0.6813 -0.0431 -0.0702 0.5385 0.0271 -17.5820 -15.6845 -3.3850 -3.3851
0.6379 0.1 100 0.6783 -0.8278 -0.9877 0.5297 0.1600 -26.7570 -23.5307 -3.2482 -3.2484
0.7855 0.15 150 0.7185 -1.4527 -1.6155 0.5231 0.1628 -33.0342 -29.7800 -3.2664 -3.2667
0.677 0.2 200 0.7468 -1.8857 -2.1602 0.5429 0.2745 -38.4815 -34.1103 -3.1875 -3.1878
0.6922 0.24 250 0.6741 -1.6341 -1.9210 0.5714 0.2869 -36.0892 -31.5939 -3.1580 -3.1584
0.7586 0.29 300 0.6942 -1.6586 -1.9019 0.5495 0.2433 -35.8988 -31.8388 -3.3099 -3.3102
0.7781 0.34 350 0.6667 -0.9291 -1.2556 0.5582 0.3265 -29.4353 -24.5438 -3.3261 -3.3264
0.9062 0.39 400 0.6779 -0.9913 -1.3228 0.5516 0.3315 -30.1079 -25.1664 -3.3846 -3.3849
0.6895 0.44 450 0.7001 -1.7388 -2.0490 0.5538 0.3102 -37.3697 -32.6413 -3.3156 -3.3158
0.6914 0.49 500 0.7140 -1.6769 -1.9965 0.5604 0.3196 -36.8445 -32.0224 -3.3308 -3.3312
0.6684 0.54 550 0.6648 -1.0055 -1.4434 0.5604 0.4379 -31.3135 -25.3085 -3.3448 -3.3453
0.6566 0.59 600 0.6873 -1.4806 -1.9669 0.5912 0.4862 -36.5482 -30.0593 -3.3429 -3.3433
0.6652 0.64 650 0.6811 -1.3638 -1.8487 0.5846 0.4849 -35.3664 -28.8914 -3.3402 -3.3407
0.8078 0.68 700 0.6813 -1.3470 -1.8420 0.5890 0.4950 -35.2997 -28.7235 -3.3181 -3.3185
0.7023 0.73 750 0.6787 -1.3433 -1.8475 0.6022 0.5042 -35.3545 -28.6860 -3.3293 -3.3297
0.5746 0.78 800 0.6761 -1.1720 -1.6876 0.5956 0.5157 -33.7557 -26.9727 -3.3266 -3.3270
0.6828 0.83 850 0.6756 -1.1797 -1.6986 0.5934 0.5189 -33.8653 -27.0500 -3.3243 -3.3247
0.6355 0.88 900 0.6760 -1.1867 -1.7058 0.5934 0.5192 -33.9377 -27.1196 -3.3243 -3.3248
0.557 0.93 950 0.6760 -1.1867 -1.7063 0.5912 0.5196 -33.9424 -27.1197 -3.3243 -3.3248
0.5707 0.98 1000 0.6759 -1.1871 -1.7068 0.5934 0.5197 -33.9475 -27.1237 -3.3244 -3.3248

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
2
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for tsavage68/v1_1000_STEPS_1e6_rate_01_beta_DPO

Finetuned
(252)
this model