sigmoid_lr2e-05_b0.1

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1587
  • Rewards/chosen: 1.0566
  • Rewards/rejected: -3.7222
  • Rewards/accuracies: 0.9348
  • Rewards/margins: 4.7788
  • Logps/rejected: -100.3634
  • Logps/chosen: -68.2028
  • Logits/rejected: -1.2118
  • Logits/chosen: -1.1884

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.1582 0.1 341 0.3412 1.4613 -0.6760 0.8442 2.1373 -69.9019 -64.1564 -0.9916 -0.9274
0.2165 0.2 682 0.2655 1.8031 -1.3141 0.8714 3.1172 -76.2827 -60.7382 -1.0115 -0.9525
0.0864 0.3 1023 0.2379 0.6173 -3.1475 0.8877 3.7648 -94.6172 -72.5967 -1.0623 -1.0198
0.3192 0.4 1364 0.2003 1.3681 -2.3819 0.9185 3.7500 -86.9604 -65.0880 -1.1691 -1.1334
0.5707 0.5 1705 0.1831 1.2028 -3.2640 0.9293 4.4667 -95.7812 -66.7415 -1.2287 -1.1992
0.0427 0.6 2046 0.1718 1.3838 -3.1327 0.9312 4.5166 -94.4690 -64.9309 -1.1900 -1.1566
0.1956 0.7 2387 0.1608 1.0344 -3.7242 0.9366 4.7586 -100.3841 -68.4254 -1.2044 -1.1795
0.0319 0.8 2728 0.1595 1.0398 -3.7445 0.9348 4.7843 -100.5868 -68.3711 -1.2077 -1.1849
0.0173 0.9 3069 0.1587 1.0566 -3.7222 0.9348 4.7788 -100.3634 -68.2028 -1.2118 -1.1884

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.1.2
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
0
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for mazzaqq/DPO_davide

Adapter
(1115)
this model