martimfasantos's picture
End of training
2089607 verified
|
raw
history blame
13.2 kB
metadata
license: apache-2.0
library_name: peft
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
datasets:
  - openai/summarize_from_feedback
model-index:
  - name: tinyllama-1.1b-sum-dpo-qlora
    results: []

tinyllama-1.1b-sum-dpo-qlora

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-qlora on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6482
  • Rewards/chosen: -0.9538
  • Rewards/rejected: -1.1194
  • Rewards/accuracies: 0.6171
  • Rewards/margins: 0.1656
  • Logps/rejected: -187.0472
  • Logps/chosen: -166.7881
  • Logits/rejected: -3.0176
  • Logits/chosen: -3.0239

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
0.6926 0.02 100 -3.4980 -3.4962 -70.9186 -74.6392 0.6930 0.5193 0.0049 0.0002 0.0047
0.6919 0.03 200 -3.4925 -3.4908 -69.9505 -73.7540 0.6926 0.5678 0.0146 0.0011 0.0135
0.6888 0.05 300 -3.4861 -3.4843 -67.8994 -72.0238 0.6911 0.5748 0.0351 0.0043 0.0308
0.6864 0.07 400 -3.4827 -3.4809 -69.7504 -74.3218 0.6890 0.5627 0.0166 0.0087 0.0079
0.6864 0.09 500 -3.4687 -3.4669 -69.0559 -74.2092 0.6864 0.5716 0.0235 0.0146 0.0090
0.6729 0.1 600 -3.4506 -3.4489 -71.3562 -77.1629 0.6837 0.5869 0.0005 0.0211 -0.0206
0.6745 0.12 700 -3.4487 -3.4467 -78.9372 -85.9956 0.6786 0.5955 -0.0753 0.0336 -0.1089
0.6681 0.14 800 -3.4169 -3.4151 -90.1915 -98.6570 0.6738 0.5955 -0.1878 0.0477 -0.2355
0.6661 0.16 900 -3.3755 -3.3740 -88.5994 -97.6376 0.6715 0.5922 -0.1719 0.0534 -0.2253
0.6686 0.17 1000 -3.3483 -3.3467 -111.1606 -121.9167 0.6681 0.5936 -0.3975 0.0706 -0.4681
0.665 0.19 1100 -3.3477 -3.3463 -92.1750 -101.6747 0.6708 0.5950 -0.2076 0.0580 -0.2657
0.6549 0.21 1200 -3.3173 -3.3159 -107.3321 -119.3906 0.6631 0.5974 -0.3592 0.0836 -0.4428
0.6536 0.22 1300 -3.2737 -3.2722 -121.8111 -135.5439 0.6591 0.5978 -0.5040 0.1004 -0.6044
0.6303 0.24 1400 -3.2790 -3.2775 -111.6529 -124.7296 0.6593 0.6055 -0.4024 0.0938 -0.4962
0.6611 0.26 1500 -3.2472 -3.2454 -132.2458 -148.1280 0.6527 0.6138 -0.6084 0.1219 -0.7302
0.6395 0.28 1600 -3.2525 -3.2505 -126.2706 -141.6170 0.6536 0.6155 -0.5486 0.1165 -0.6651
0.678 0.29 1700 -3.2125 -3.2107 -117.8728 -131.2285 0.6587 0.6169 -0.4646 0.0966 -0.5612
0.629 0.31 1800 -3.1113 -3.1087 -146.8860 -164.9026 0.6489 0.6187 -0.7548 0.1432 -0.8980
0.6622 0.33 1900 -3.1419 -3.1399 -125.9992 -140.6700 0.6555 0.6069 -0.5459 0.1097 -0.6556
0.64 0.34 2000 -3.1847 -3.1824 -140.1714 -156.3843 0.6523 0.6101 -0.6876 0.1252 -0.8128
0.6479 0.36 2100 -3.1160 -3.1130 -150.8988 -167.6336 0.6537 0.6104 -0.7949 0.1304 -0.9253
0.6023 0.38 2200 -3.1479 -3.1449 -137.7163 -153.7927 0.6536 0.6034 -0.6631 0.1238 -0.7869
0.5962 0.4 2300 -3.1012 -3.0975 -159.4141 -177.2301 0.6523 0.6078 -0.8800 0.1412 -1.0212
0.6176 0.41 2400 -3.0320 -3.0265 -172.7089 -192.7748 0.6506 0.6027 -1.0130 0.1637 -1.1767
0.6255 0.43 2500 -3.0629 -3.0584 -156.9642 -175.3398 0.6507 0.6101 -0.8555 0.1468 -1.0023
0.6075 0.45 2600 -3.0877 -3.0839 -146.0736 -162.3147 0.6547 0.6046 -0.7466 0.1254 -0.8721
0.6282 0.47 2700 -3.1221 -3.1185 -140.7325 -157.2624 0.6531 0.6101 -0.6932 0.1283 -0.8216
0.6495 0.48 2800 -3.0926 -3.0887 -148.7372 -166.3009 0.6517 0.6080 -0.7733 0.1387 -0.9119
0.6202 0.5 2900 -3.0787 -3.0744 -152.9659 -170.9832 0.6512 0.6048 -0.8156 0.1432 -0.9588
0.6252 0.52 3000 -3.0824 -3.0782 -148.4267 -166.3868 0.6505 0.6055 -0.7702 0.1426 -0.9128
0.6082 0.53 3100 -3.0723 -3.0678 -149.2047 -167.4548 0.6500 0.6115 -0.7779 0.1455 -0.9235
0.6072 0.55 3200 -3.0863 -3.0819 -147.0810 -164.9669 0.6499 0.6090 -0.7567 0.1419 -0.8986
0.6142 0.57 3300 -3.0087 -3.0026 -179.2665 -200.5992 0.6468 0.6176 -1.0786 0.1764 -1.2549
0.602 0.59 3400 -3.0674 -3.0624 -150.3082 -168.4087 0.6504 0.6136 -0.7890 0.1440 -0.9330
0.605 0.6 3500 -3.0590 -3.0538 -154.1790 -172.9109 0.6497 0.6122 -0.8277 0.1503 -0.9780
0.6263 0.62 3600 -3.0721 -3.0672 -149.9757 -168.0735 0.6508 0.6043 -0.7857 0.1440 -0.9297
0.5961 0.64 3700 -3.0151 -3.0090 -169.4567 -189.3689 0.6492 0.6136 -0.9805 0.1622 -1.1426
0.6273 0.65 3800 -3.0117 -3.0057 -167.9805 -187.6573 0.6494 0.6141 -0.9657 0.1598 -1.1255
0.6183 0.67 3900 -3.0137 -3.0077 -167.4417 -187.2734 0.6488 0.6166 -0.9603 0.1613 -1.1217
0.6051 0.69 4000 -2.9974 -2.9908 -176.3739 -197.1255 0.6482 0.6178 -1.0496 0.1705 -1.2202
0.5867 0.71 4100 -3.0151 -3.0088 -169.1084 -189.3998 0.6484 0.6125 -0.9770 0.1659 -1.1429
0.6554 0.72 4200 -3.0270 -3.0209 -164.2755 -184.0126 0.6489 0.6176 -0.9287 0.1604 -1.0891
0.6053 0.74 4300 -3.0362 -3.0303 -159.9774 -179.4446 0.6489 0.6097 -0.8857 0.1577 -1.0434
0.6153 0.76 4400 -3.0351 -3.0292 -160.5470 -180.1235 0.6489 0.6120 -0.8914 0.1588 -1.0502
0.6145 0.78 4500 -3.0378 -3.0319 -160.1720 -179.6728 0.6490 0.6113 -0.8876 0.1580 -1.0457
0.5798 0.79 4600 -3.0308 -3.0247 -162.6813 -182.4701 0.6488 0.6148 -0.9127 0.1609 -1.0736
0.6218 0.81 4700 -3.0307 -3.0246 -163.0493 -182.9482 0.6486 0.6152 -0.9164 0.1620 -1.0784
0.6102 0.83 4800 -3.0259 -3.0197 -164.8939 -184.9769 0.6484 0.6150 -0.9348 0.1639 -1.0987
0.6176 0.84 4900 -3.0273 -3.0211 -165.7554 -185.9428 0.6483 0.6157 -0.9435 0.1649 -1.1084
0.5907 0.86 5000 -3.0259 -3.0196 -167.1301 -187.4627 0.6482 0.6164 -0.9572 0.1664 -1.1236
0.6534 0.88 5100 -3.0211 -3.0148 -167.2241 -187.5712 0.6481 0.6155 -0.9581 0.1665 -1.1246
0.5973 0.9 5200 -3.0194 -3.0130 -166.8823 -187.1679 0.6483 0.6169 -0.9547 0.1659 -1.1206
0.5975 0.91 5300 -3.0248 -3.0185 -166.6118 -186.8759 0.6482 0.6162 -0.9520 0.1657 -1.1177
0.5986 0.93 5400 -3.0249 -3.0186 -166.6502 -186.8928 0.6483 0.6190 -0.9524 0.1655 -1.1179
0.6025 0.95 5500 -3.0252 -3.0189 -166.7467 -186.9980 0.6483 0.6169 -0.9534 0.1655 -1.1189
0.6149 0.96 5600 -3.0244 -3.0181 -166.7859 -187.1137 0.6480 0.6155 -0.9538 0.1663 -1.1201
0.6275 0.98 5700 -3.0245 -3.0182 -166.6791 -186.9484 0.6482 0.6178 -0.9527 0.1657 -1.1184
0.5876 1.0 5800 -3.0239 -3.0176 -166.7881 -187.0472 0.6482 0.6171 -0.9538 0.1656 -1.1194

Framework versions

  • PEFT 0.7.1
  • Transformers 4.39.3
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.15.2