Llama-2-7b-hf-DPO-LookAhead3_FullEval_TTree1.4_TLoop0.7_TEval0.2_Filter0.2_V2.0
This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.6353
- Rewards/chosen: -3.2199
- Rewards/rejected: -3.7792
- Rewards/accuracies: 0.625
- Rewards/margins: 0.5593
- Logps/rejected: -145.3183
- Logps/chosen: -164.8658
- Logits/rejected: -1.1220
- Logits/chosen: -1.0854
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6639 | 0.3037 | 53 | 0.6613 | 0.0051 | -0.0607 | 0.875 | 0.0658 | -108.1335 | -132.6158 | -0.6137 | -0.5723 |
0.6476 | 0.6074 | 106 | 0.6171 | -0.2010 | -0.3748 | 0.625 | 0.1738 | -111.2741 | -134.6767 | -0.6554 | -0.6155 |
0.6552 | 0.9112 | 159 | 0.6850 | -0.4026 | -0.4336 | 0.5 | 0.0310 | -111.8621 | -136.6923 | -0.6025 | -0.5605 |
0.271 | 1.2149 | 212 | 0.5592 | -1.1775 | -1.5117 | 0.75 | 0.3342 | -122.6435 | -144.4414 | -0.6651 | -0.6240 |
0.2321 | 1.5186 | 265 | 0.6523 | -1.6722 | -1.8791 | 0.5 | 0.2069 | -126.3177 | -149.3886 | -0.7461 | -0.7056 |
0.3961 | 1.8223 | 318 | 0.5176 | -1.1964 | -1.6762 | 0.875 | 0.4798 | -124.2882 | -144.6302 | -0.8107 | -0.7719 |
0.1421 | 2.1261 | 371 | 0.6029 | -2.4068 | -2.8869 | 0.625 | 0.4801 | -136.3952 | -156.7344 | -1.0103 | -0.9720 |
0.5702 | 2.4298 | 424 | 0.6557 | -3.1785 | -3.6978 | 0.625 | 0.5193 | -144.5047 | -164.4516 | -1.0897 | -1.0539 |
0.2376 | 2.7335 | 477 | 0.6353 | -3.2199 | -3.7792 | 0.625 | 0.5593 | -145.3183 | -164.8658 | -1.1220 | -1.0854 |
Framework versions
- PEFT 0.12.0
- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 3.0.0
- Tokenizers 0.19.1
- Downloads last month
- 1
Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead3_FullEval_TTree1.4_TLoop0.7_TEval0.2_Filter0.2_V2.0
Base model
meta-llama/Llama-2-7b-hf