File size: 13,224 Bytes
149e77a
 
 
 
2089607
149e77a
 
 
 
2089607
 
149e77a
 
 
 
 
 
 
 
 
 
2089607
149e77a
 
ea4ba97
 
2089607
 
 
 
 
 
149e77a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ea4ba97
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
149e77a
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
license: apache-2.0
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
datasets:
- openai/summarize_from_feedback
model-index:
- name: tinyllama-1.1b-sum-dpo-qlora
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# tinyllama-1.1b-sum-dpo-qlora

This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-qlora](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-qlora) on the openai/summarize_from_feedback dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6482
- Rewards/chosen: -0.9538
- Rewards/rejected: -1.1194
- Rewards/accuracies: 0.6171
- Rewards/margins: 0.1656
- Logps/rejected: -187.0472
- Logps/chosen: -166.7881
- Logits/rejected: -3.0176
- Logits/chosen: -3.0239

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
|:-------------:|:-----:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:|
| 0.6926        | 0.02  | 100  | -3.4980       | -3.4962         | -70.9186     | -74.6392       | 0.6930          | 0.5193             | 0.0049         | 0.0002          | 0.0047           |
| 0.6919        | 0.03  | 200  | -3.4925       | -3.4908         | -69.9505     | -73.7540       | 0.6926          | 0.5678             | 0.0146         | 0.0011          | 0.0135           |
| 0.6888        | 0.05  | 300  | -3.4861       | -3.4843         | -67.8994     | -72.0238       | 0.6911          | 0.5748             | 0.0351         | 0.0043          | 0.0308           |
| 0.6864        | 0.07  | 400  | -3.4827       | -3.4809         | -69.7504     | -74.3218       | 0.6890          | 0.5627             | 0.0166         | 0.0087          | 0.0079           |
| 0.6864        | 0.09  | 500  | -3.4687       | -3.4669         | -69.0559     | -74.2092       | 0.6864          | 0.5716             | 0.0235         | 0.0146          | 0.0090           |
| 0.6729        | 0.1   | 600  | -3.4506       | -3.4489         | -71.3562     | -77.1629       | 0.6837          | 0.5869             | 0.0005         | 0.0211          | -0.0206          |
| 0.6745        | 0.12  | 700  | -3.4487       | -3.4467         | -78.9372     | -85.9956       | 0.6786          | 0.5955             | -0.0753        | 0.0336          | -0.1089          |
| 0.6681        | 0.14  | 800  | -3.4169       | -3.4151         | -90.1915     | -98.6570       | 0.6738          | 0.5955             | -0.1878        | 0.0477          | -0.2355          |
| 0.6661        | 0.16  | 900  | -3.3755       | -3.3740         | -88.5994     | -97.6376       | 0.6715          | 0.5922             | -0.1719        | 0.0534          | -0.2253          |
| 0.6686        | 0.17  | 1000 | -3.3483       | -3.3467         | -111.1606    | -121.9167      | 0.6681          | 0.5936             | -0.3975        | 0.0706          | -0.4681          |
| 0.665         | 0.19  | 1100 | -3.3477       | -3.3463         | -92.1750     | -101.6747      | 0.6708          | 0.5950             | -0.2076        | 0.0580          | -0.2657          |
| 0.6549        | 0.21  | 1200 | -3.3173       | -3.3159         | -107.3321    | -119.3906      | 0.6631          | 0.5974             | -0.3592        | 0.0836          | -0.4428          |
| 0.6536        | 0.22  | 1300 | -3.2737       | -3.2722         | -121.8111    | -135.5439      | 0.6591          | 0.5978             | -0.5040        | 0.1004          | -0.6044          |
| 0.6303        | 0.24  | 1400 | -3.2790       | -3.2775         | -111.6529    | -124.7296      | 0.6593          | 0.6055             | -0.4024        | 0.0938          | -0.4962          |
| 0.6611        | 0.26  | 1500 | -3.2472       | -3.2454         | -132.2458    | -148.1280      | 0.6527          | 0.6138             | -0.6084        | 0.1219          | -0.7302          |
| 0.6395        | 0.28  | 1600 | -3.2525       | -3.2505         | -126.2706    | -141.6170      | 0.6536          | 0.6155             | -0.5486        | 0.1165          | -0.6651          |
| 0.678         | 0.29  | 1700 | -3.2125       | -3.2107         | -117.8728    | -131.2285      | 0.6587          | 0.6169             | -0.4646        | 0.0966          | -0.5612          |
| 0.629         | 0.31  | 1800 | -3.1113       | -3.1087         | -146.8860    | -164.9026      | 0.6489          | 0.6187             | -0.7548        | 0.1432          | -0.8980          |
| 0.6622        | 0.33  | 1900 | -3.1419       | -3.1399         | -125.9992    | -140.6700      | 0.6555          | 0.6069             | -0.5459        | 0.1097          | -0.6556          |
| 0.64          | 0.34  | 2000 | -3.1847       | -3.1824         | -140.1714    | -156.3843      | 0.6523          | 0.6101             | -0.6876        | 0.1252          | -0.8128          |
| 0.6479        | 0.36  | 2100 | -3.1160       | -3.1130         | -150.8988    | -167.6336      | 0.6537          | 0.6104             | -0.7949        | 0.1304          | -0.9253          |
| 0.6023        | 0.38  | 2200 | -3.1479       | -3.1449         | -137.7163    | -153.7927      | 0.6536          | 0.6034             | -0.6631        | 0.1238          | -0.7869          |
| 0.5962        | 0.4   | 2300 | -3.1012       | -3.0975         | -159.4141    | -177.2301      | 0.6523          | 0.6078             | -0.8800        | 0.1412          | -1.0212          |
| 0.6176        | 0.41  | 2400 | -3.0320       | -3.0265         | -172.7089    | -192.7748      | 0.6506          | 0.6027             | -1.0130        | 0.1637          | -1.1767          |
| 0.6255        | 0.43  | 2500 | -3.0629       | -3.0584         | -156.9642    | -175.3398      | 0.6507          | 0.6101             | -0.8555        | 0.1468          | -1.0023          |
| 0.6075        | 0.45  | 2600 | -3.0877       | -3.0839         | -146.0736    | -162.3147      | 0.6547          | 0.6046             | -0.7466        | 0.1254          | -0.8721          |
| 0.6282        | 0.47  | 2700 | -3.1221       | -3.1185         | -140.7325    | -157.2624      | 0.6531          | 0.6101             | -0.6932        | 0.1283          | -0.8216          |
| 0.6495        | 0.48  | 2800 | -3.0926       | -3.0887         | -148.7372    | -166.3009      | 0.6517          | 0.6080             | -0.7733        | 0.1387          | -0.9119          |
| 0.6202        | 0.5   | 2900 | -3.0787       | -3.0744         | -152.9659    | -170.9832      | 0.6512          | 0.6048             | -0.8156        | 0.1432          | -0.9588          |
| 0.6252        | 0.52  | 3000 | -3.0824       | -3.0782         | -148.4267    | -166.3868      | 0.6505          | 0.6055             | -0.7702        | 0.1426          | -0.9128          |
| 0.6082        | 0.53  | 3100 | -3.0723       | -3.0678         | -149.2047    | -167.4548      | 0.6500          | 0.6115             | -0.7779        | 0.1455          | -0.9235          |
| 0.6072        | 0.55  | 3200 | -3.0863       | -3.0819         | -147.0810    | -164.9669      | 0.6499          | 0.6090             | -0.7567        | 0.1419          | -0.8986          |
| 0.6142        | 0.57  | 3300 | -3.0087       | -3.0026         | -179.2665    | -200.5992      | 0.6468          | 0.6176             | -1.0786        | 0.1764          | -1.2549          |
| 0.602         | 0.59  | 3400 | -3.0674       | -3.0624         | -150.3082    | -168.4087      | 0.6504          | 0.6136             | -0.7890        | 0.1440          | -0.9330          |
| 0.605         | 0.6   | 3500 | -3.0590       | -3.0538         | -154.1790    | -172.9109      | 0.6497          | 0.6122             | -0.8277        | 0.1503          | -0.9780          |
| 0.6263        | 0.62  | 3600 | -3.0721       | -3.0672         | -149.9757    | -168.0735      | 0.6508          | 0.6043             | -0.7857        | 0.1440          | -0.9297          |
| 0.5961        | 0.64  | 3700 | -3.0151       | -3.0090         | -169.4567    | -189.3689      | 0.6492          | 0.6136             | -0.9805        | 0.1622          | -1.1426          |
| 0.6273        | 0.65  | 3800 | -3.0117       | -3.0057         | -167.9805    | -187.6573      | 0.6494          | 0.6141             | -0.9657        | 0.1598          | -1.1255          |
| 0.6183        | 0.67  | 3900 | -3.0137       | -3.0077         | -167.4417    | -187.2734      | 0.6488          | 0.6166             | -0.9603        | 0.1613          | -1.1217          |
| 0.6051        | 0.69  | 4000 | -2.9974       | -2.9908         | -176.3739    | -197.1255      | 0.6482          | 0.6178             | -1.0496        | 0.1705          | -1.2202          |
| 0.5867        | 0.71  | 4100 | -3.0151       | -3.0088         | -169.1084    | -189.3998      | 0.6484          | 0.6125             | -0.9770        | 0.1659          | -1.1429          |
| 0.6554        | 0.72  | 4200 | -3.0270       | -3.0209         | -164.2755    | -184.0126      | 0.6489          | 0.6176             | -0.9287        | 0.1604          | -1.0891          |
| 0.6053        | 0.74  | 4300 | -3.0362       | -3.0303         | -159.9774    | -179.4446      | 0.6489          | 0.6097             | -0.8857        | 0.1577          | -1.0434          |
| 0.6153        | 0.76  | 4400 | -3.0351       | -3.0292         | -160.5470    | -180.1235      | 0.6489          | 0.6120             | -0.8914        | 0.1588          | -1.0502          |
| 0.6145        | 0.78  | 4500 | -3.0378       | -3.0319         | -160.1720    | -179.6728      | 0.6490          | 0.6113             | -0.8876        | 0.1580          | -1.0457          |
| 0.5798        | 0.79  | 4600 | -3.0308       | -3.0247         | -162.6813    | -182.4701      | 0.6488          | 0.6148             | -0.9127        | 0.1609          | -1.0736          |
| 0.6218        | 0.81  | 4700 | -3.0307       | -3.0246         | -163.0493    | -182.9482      | 0.6486          | 0.6152             | -0.9164        | 0.1620          | -1.0784          |
| 0.6102        | 0.83  | 4800 | -3.0259       | -3.0197         | -164.8939    | -184.9769      | 0.6484          | 0.6150             | -0.9348        | 0.1639          | -1.0987          |
| 0.6176        | 0.84  | 4900 | -3.0273       | -3.0211         | -165.7554    | -185.9428      | 0.6483          | 0.6157             | -0.9435        | 0.1649          | -1.1084          |
| 0.5907        | 0.86  | 5000 | -3.0259       | -3.0196         | -167.1301    | -187.4627      | 0.6482          | 0.6164             | -0.9572        | 0.1664          | -1.1236          |
| 0.6534        | 0.88  | 5100 | -3.0211       | -3.0148         | -167.2241    | -187.5712      | 0.6481          | 0.6155             | -0.9581        | 0.1665          | -1.1246          |
| 0.5973        | 0.9   | 5200 | -3.0194       | -3.0130         | -166.8823    | -187.1679      | 0.6483          | 0.6169             | -0.9547        | 0.1659          | -1.1206          |
| 0.5975        | 0.91  | 5300 | -3.0248       | -3.0185         | -166.6118    | -186.8759      | 0.6482          | 0.6162             | -0.9520        | 0.1657          | -1.1177          |
| 0.5986        | 0.93  | 5400 | -3.0249       | -3.0186         | -166.6502    | -186.8928      | 0.6483          | 0.6190             | -0.9524        | 0.1655          | -1.1179          |
| 0.6025        | 0.95  | 5500 | -3.0252       | -3.0189         | -166.7467    | -186.9980      | 0.6483          | 0.6169             | -0.9534        | 0.1655          | -1.1189          |
| 0.6149        | 0.96  | 5600 | -3.0244       | -3.0181         | -166.7859    | -187.1137      | 0.6480          | 0.6155             | -0.9538        | 0.1663          | -1.1201          |
| 0.6275        | 0.98  | 5700 | -3.0245       | -3.0182         | -166.6791    | -186.9484      | 0.6482          | 0.6178             | -0.9527        | 0.1657          | -1.1184          |
| 0.5876        | 1.0   | 5800 | -3.0239       | -3.0176         | -166.7881    | -187.0472      | 0.6482          | 0.6171             | -0.9538        | 0.1656          | -1.1194          |


### Framework versions

- PEFT 0.7.1
- Transformers 4.39.3
- Pytorch 2.1.2
- Datasets 2.18.0
- Tokenizers 0.15.2