silviasapora commited on
Commit
7cedfc1
·
verified ·
1 Parent(s): d916476

Model save

Browse files
Files changed (4) hide show
  1. README.md +67 -0
  2. all_results.json +9 -0
  3. train_results.json +9 -0
  4. trainer_state.json +1162 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: google/gemma-7b
3
+ library_name: transformers
4
+ model_name: gemma-7b-orpo-shuffled-5e-5-v5
5
+ tags:
6
+ - generated_from_trainer
7
+ - trl
8
+ - orpo
9
+ licence: license
10
+ ---
11
+
12
+ # Model Card for gemma-7b-orpo-shuffled-5e-5-v5
13
+
14
+ This model is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b).
15
+ It has been trained using [TRL](https://github.com/huggingface/trl).
16
+
17
+ ## Quick start
18
+
19
+ ```python
20
+ from transformers import pipeline
21
+
22
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="silviasapora/gemma-7b-orpo-shuffled-5e-5-v5", device="cuda")
24
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
+ print(output["generated_text"])
26
+ ```
27
+
28
+ ## Training procedure
29
+
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/silvias/huggingface/runs/srb7mwnt)
31
+
32
+
33
+ This model was trained with ORPO, a method introduced in [ORPO: Monolithic Preference Optimization without Reference Model](https://huggingface.co/papers/2403.07691).
34
+
35
+ ### Framework versions
36
+
37
+ - TRL: 0.13.0
38
+ - Transformers: 4.48.1
39
+ - Pytorch: 2.5.1
40
+ - Datasets: 3.2.0
41
+ - Tokenizers: 0.21.0
42
+
43
+ ## Citations
44
+
45
+ Cite ORPO as:
46
+
47
+ ```bibtex
48
+ @article{hong2024orpo,
49
+ title = {{ORPO: Monolithic Preference Optimization without Reference Model}},
50
+ author = {Jiwoo Hong and Noah Lee and James Thorne},
51
+ year = 2024,
52
+ eprint = {arXiv:2403.07691}
53
+ }
54
+ ```
55
+
56
+ Cite TRL as:
57
+
58
+ ```bibtex
59
+ @misc{vonwerra2022trl,
60
+ title = {{TRL: Transformer Reinforcement Learning}},
61
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
62
+ year = 2020,
63
+ journal = {GitHub repository},
64
+ publisher = {GitHub},
65
+ howpublished = {\url{https://github.com/huggingface/trl}}
66
+ }
67
+ ```
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.981333333333333,
3
+ "total_flos": 0.0,
4
+ "train_loss": 65.2437489175389,
5
+ "train_runtime": 10349.5719,
6
+ "train_samples": 7500,
7
+ "train_samples_per_second": 2.174,
8
+ "train_steps_per_second": 0.034
9
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.981333333333333,
3
+ "total_flos": 0.0,
4
+ "train_loss": 65.2437489175389,
5
+ "train_runtime": 10349.5719,
6
+ "train_samples": 7500,
7
+ "train_samples_per_second": 2.174,
8
+ "train_steps_per_second": 0.034
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.981333333333333,
5
+ "eval_steps": 500,
6
+ "global_step": 351,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.042666666666666665,
13
+ "grad_norm": 1613.9912109375,
14
+ "learning_rate": 6.944444444444445e-06,
15
+ "log_odds_chosen": 2.8254246711730957,
16
+ "log_odds_ratio": -6.873815059661865,
17
+ "logps/chosen": -22.453031539916992,
18
+ "logps/rejected": -25.278715133666992,
19
+ "loss": 416.6354,
20
+ "nll_loss": 9.582948684692383,
21
+ "rewards/accuracies": 0.515625,
22
+ "rewards/chosen": -11.226515769958496,
23
+ "rewards/margins": 1.412840485572815,
24
+ "rewards/rejected": -12.639357566833496,
25
+ "step": 5
26
+ },
27
+ {
28
+ "epoch": 0.08533333333333333,
29
+ "grad_norm": 543.0881958007812,
30
+ "learning_rate": 1.388888888888889e-05,
31
+ "log_odds_chosen": 3.1465320587158203,
32
+ "log_odds_ratio": -6.362111568450928,
33
+ "logps/chosen": -22.178117752075195,
34
+ "logps/rejected": -25.326404571533203,
35
+ "loss": 384.1808,
36
+ "nll_loss": 8.82459545135498,
37
+ "rewards/accuracies": 0.48124998807907104,
38
+ "rewards/chosen": -11.089058876037598,
39
+ "rewards/margins": 1.5741435289382935,
40
+ "rewards/rejected": -12.663202285766602,
41
+ "step": 10
42
+ },
43
+ {
44
+ "epoch": 0.128,
45
+ "grad_norm": 530.1594848632812,
46
+ "learning_rate": 2.0833333333333336e-05,
47
+ "log_odds_chosen": 7.091063022613525,
48
+ "log_odds_ratio": -5.8868255615234375,
49
+ "logps/chosen": -21.351957321166992,
50
+ "logps/rejected": -28.443527221679688,
51
+ "loss": 370.4253,
52
+ "nll_loss": 8.632375717163086,
53
+ "rewards/accuracies": 0.528124988079071,
54
+ "rewards/chosen": -10.675978660583496,
55
+ "rewards/margins": 3.5457847118377686,
56
+ "rewards/rejected": -14.221763610839844,
57
+ "step": 15
58
+ },
59
+ {
60
+ "epoch": 0.17066666666666666,
61
+ "grad_norm": 1538.9217529296875,
62
+ "learning_rate": 2.777777777777778e-05,
63
+ "log_odds_chosen": -1.1623966693878174,
64
+ "log_odds_ratio": -7.707022666931152,
65
+ "logps/chosen": -18.182302474975586,
66
+ "logps/rejected": -17.020410537719727,
67
+ "loss": 361.6386,
68
+ "nll_loss": 7.447694301605225,
69
+ "rewards/accuracies": 0.48750001192092896,
70
+ "rewards/chosen": -9.091151237487793,
71
+ "rewards/margins": -0.5809468626976013,
72
+ "rewards/rejected": -8.510205268859863,
73
+ "step": 20
74
+ },
75
+ {
76
+ "epoch": 0.21333333333333335,
77
+ "grad_norm": 372.1598815917969,
78
+ "learning_rate": 3.472222222222222e-05,
79
+ "log_odds_chosen": -0.20203940570354462,
80
+ "log_odds_ratio": -2.3990657329559326,
81
+ "logps/chosen": -6.309157371520996,
82
+ "logps/rejected": -6.091958522796631,
83
+ "loss": 147.0848,
84
+ "nll_loss": 3.3968677520751953,
85
+ "rewards/accuracies": 0.5562499761581421,
86
+ "rewards/chosen": -3.154578685760498,
87
+ "rewards/margins": -0.10859958082437515,
88
+ "rewards/rejected": -3.0459792613983154,
89
+ "step": 25
90
+ },
91
+ {
92
+ "epoch": 0.256,
93
+ "grad_norm": 283.9689636230469,
94
+ "learning_rate": 4.166666666666667e-05,
95
+ "log_odds_chosen": 0.3445066511631012,
96
+ "log_odds_ratio": -0.707059383392334,
97
+ "logps/chosen": -1.7059379816055298,
98
+ "logps/rejected": -1.9979642629623413,
99
+ "loss": 77.425,
100
+ "nll_loss": 2.065999984741211,
101
+ "rewards/accuracies": 0.625,
102
+ "rewards/chosen": -0.8529689908027649,
103
+ "rewards/margins": 0.14601318538188934,
104
+ "rewards/rejected": -0.9989821314811707,
105
+ "step": 30
106
+ },
107
+ {
108
+ "epoch": 0.2986666666666667,
109
+ "grad_norm": 159.0054473876953,
110
+ "learning_rate": 4.8611111111111115e-05,
111
+ "log_odds_chosen": 0.4161193370819092,
112
+ "log_odds_ratio": -0.6260719299316406,
113
+ "logps/chosen": -1.5950376987457275,
114
+ "logps/rejected": -1.9357197284698486,
115
+ "loss": 70.8065,
116
+ "nll_loss": 1.8996658325195312,
117
+ "rewards/accuracies": 0.690625011920929,
118
+ "rewards/chosen": -0.7975188493728638,
119
+ "rewards/margins": 0.17034098505973816,
120
+ "rewards/rejected": -0.9678598642349243,
121
+ "step": 35
122
+ },
123
+ {
124
+ "epoch": 0.3413333333333333,
125
+ "grad_norm": 136.55224609375,
126
+ "learning_rate": 4.998010925565448e-05,
127
+ "log_odds_chosen": 0.37962982058525085,
128
+ "log_odds_ratio": -0.6669248342514038,
129
+ "logps/chosen": -1.6330034732818604,
130
+ "logps/rejected": -1.9587697982788086,
131
+ "loss": 72.4846,
132
+ "nll_loss": 1.9316819906234741,
133
+ "rewards/accuracies": 0.659375011920929,
134
+ "rewards/chosen": -0.8165017366409302,
135
+ "rewards/margins": 0.16288331151008606,
136
+ "rewards/rejected": -0.9793848991394043,
137
+ "step": 40
138
+ },
139
+ {
140
+ "epoch": 0.384,
141
+ "grad_norm": 150.20294189453125,
142
+ "learning_rate": 4.989935734988098e-05,
143
+ "log_odds_chosen": 0.2513929307460785,
144
+ "log_odds_ratio": -0.6757463216781616,
145
+ "logps/chosen": -1.545570731163025,
146
+ "logps/rejected": -1.7443872690200806,
147
+ "loss": 68.9163,
148
+ "nll_loss": 1.8157603740692139,
149
+ "rewards/accuracies": 0.6031249761581421,
150
+ "rewards/chosen": -0.7727853655815125,
151
+ "rewards/margins": 0.09940830618143082,
152
+ "rewards/rejected": -0.8721936345100403,
153
+ "step": 45
154
+ },
155
+ {
156
+ "epoch": 0.4266666666666667,
157
+ "grad_norm": 119.20894622802734,
158
+ "learning_rate": 4.975670171853926e-05,
159
+ "log_odds_chosen": 0.3823354244232178,
160
+ "log_odds_ratio": -0.6327594518661499,
161
+ "logps/chosen": -1.4006574153900146,
162
+ "logps/rejected": -1.7134723663330078,
163
+ "loss": 63.767,
164
+ "nll_loss": 1.6763384342193604,
165
+ "rewards/accuracies": 0.6343749761581421,
166
+ "rewards/chosen": -0.7003287076950073,
167
+ "rewards/margins": 0.15640754997730255,
168
+ "rewards/rejected": -0.8567361831665039,
169
+ "step": 50
170
+ },
171
+ {
172
+ "epoch": 0.4693333333333333,
173
+ "grad_norm": 114.05499267578125,
174
+ "learning_rate": 4.9552497026005974e-05,
175
+ "log_odds_chosen": 0.3634984791278839,
176
+ "log_odds_ratio": -0.6489454507827759,
177
+ "logps/chosen": -1.4094264507293701,
178
+ "logps/rejected": -1.695447325706482,
179
+ "loss": 63.667,
180
+ "nll_loss": 1.6651201248168945,
181
+ "rewards/accuracies": 0.596875011920929,
182
+ "rewards/chosen": -0.7047132253646851,
183
+ "rewards/margins": 0.14301049709320068,
184
+ "rewards/rejected": -0.847723662853241,
185
+ "step": 55
186
+ },
187
+ {
188
+ "epoch": 0.512,
189
+ "grad_norm": 124.75395965576172,
190
+ "learning_rate": 4.928725095732169e-05,
191
+ "log_odds_chosen": 0.5368436574935913,
192
+ "log_odds_ratio": -0.5782175660133362,
193
+ "logps/chosen": -1.3039026260375977,
194
+ "logps/rejected": -1.7335163354873657,
195
+ "loss": 60.2436,
196
+ "nll_loss": 1.593503713607788,
197
+ "rewards/accuracies": 0.7093750238418579,
198
+ "rewards/chosen": -0.6519513130187988,
199
+ "rewards/margins": 0.21480686962604523,
200
+ "rewards/rejected": -0.8667581677436829,
201
+ "step": 60
202
+ },
203
+ {
204
+ "epoch": 0.5546666666666666,
205
+ "grad_norm": 131.53451538085938,
206
+ "learning_rate": 4.896162295600589e-05,
207
+ "log_odds_chosen": 0.6900291442871094,
208
+ "log_odds_ratio": -0.5631518959999084,
209
+ "logps/chosen": -1.3248389959335327,
210
+ "logps/rejected": -1.913548231124878,
211
+ "loss": 60.8039,
212
+ "nll_loss": 1.6185461282730103,
213
+ "rewards/accuracies": 0.7093750238418579,
214
+ "rewards/chosen": -0.6624194979667664,
215
+ "rewards/margins": 0.29435473680496216,
216
+ "rewards/rejected": -0.956774115562439,
217
+ "step": 65
218
+ },
219
+ {
220
+ "epoch": 0.5973333333333334,
221
+ "grad_norm": 123.46876525878906,
222
+ "learning_rate": 4.8576422584576514e-05,
223
+ "log_odds_chosen": 0.9186612367630005,
224
+ "log_odds_ratio": -0.5672580003738403,
225
+ "logps/chosen": -1.2781449556350708,
226
+ "logps/rejected": -2.095371723175049,
227
+ "loss": 59.6718,
228
+ "nll_loss": 1.581115484237671,
229
+ "rewards/accuracies": 0.675000011920929,
230
+ "rewards/chosen": -0.6390724778175354,
231
+ "rewards/margins": 0.40861329436302185,
232
+ "rewards/rejected": -1.0476858615875244,
233
+ "step": 70
234
+ },
235
+ {
236
+ "epoch": 0.64,
237
+ "grad_norm": 190.750732421875,
238
+ "learning_rate": 4.813260751184992e-05,
239
+ "log_odds_chosen": 1.2489697933197021,
240
+ "log_odds_ratio": -0.5562187433242798,
241
+ "logps/chosen": -1.2655675411224365,
242
+ "logps/rejected": -2.3971314430236816,
243
+ "loss": 59.4317,
244
+ "nll_loss": 1.5791301727294922,
245
+ "rewards/accuracies": 0.675000011920929,
246
+ "rewards/chosen": -0.6327837705612183,
247
+ "rewards/margins": 0.5657819509506226,
248
+ "rewards/rejected": -1.1985657215118408,
249
+ "step": 75
250
+ },
251
+ {
252
+ "epoch": 0.6826666666666666,
253
+ "grad_norm": 102.65909576416016,
254
+ "learning_rate": 4.763128113202537e-05,
255
+ "log_odds_chosen": 0.9830948710441589,
256
+ "log_odds_ratio": -0.6053479313850403,
257
+ "logps/chosen": -1.2050533294677734,
258
+ "logps/rejected": -2.0940186977386475,
259
+ "loss": 58.5322,
260
+ "nll_loss": 1.526456594467163,
261
+ "rewards/accuracies": 0.621874988079071,
262
+ "rewards/chosen": -0.6025266647338867,
263
+ "rewards/margins": 0.4444827437400818,
264
+ "rewards/rejected": -1.0470093488693237,
265
+ "step": 80
266
+ },
267
+ {
268
+ "epoch": 0.7253333333333334,
269
+ "grad_norm": 155.4401397705078,
270
+ "learning_rate": 4.707368982147318e-05,
271
+ "log_odds_chosen": 1.329529047012329,
272
+ "log_odds_ratio": -0.570560097694397,
273
+ "logps/chosen": -1.186272382736206,
274
+ "logps/rejected": -2.406183958053589,
275
+ "loss": 58.0726,
276
+ "nll_loss": 1.52948796749115,
277
+ "rewards/accuracies": 0.640625,
278
+ "rewards/chosen": -0.593136191368103,
279
+ "rewards/margins": 0.6099557876586914,
280
+ "rewards/rejected": -1.2030919790267944,
281
+ "step": 85
282
+ },
283
+ {
284
+ "epoch": 0.768,
285
+ "grad_norm": 124.47338104248047,
286
+ "learning_rate": 4.6461219840046654e-05,
287
+ "log_odds_chosen": 1.9252653121948242,
288
+ "log_odds_ratio": -0.528831422328949,
289
+ "logps/chosen": -1.148413896560669,
290
+ "logps/rejected": -2.926398754119873,
291
+ "loss": 55.9153,
292
+ "nll_loss": 1.4829370975494385,
293
+ "rewards/accuracies": 0.6968749761581421,
294
+ "rewards/chosen": -0.5742069482803345,
295
+ "rewards/margins": 0.888992190361023,
296
+ "rewards/rejected": -1.4631993770599365,
297
+ "step": 90
298
+ },
299
+ {
300
+ "epoch": 0.8106666666666666,
301
+ "grad_norm": 85.16313171386719,
302
+ "learning_rate": 4.579539388462173e-05,
303
+ "log_odds_chosen": 1.8349857330322266,
304
+ "log_odds_ratio": -0.5613174438476562,
305
+ "logps/chosen": -1.1328500509262085,
306
+ "logps/rejected": -2.83276104927063,
307
+ "loss": 56.9632,
308
+ "nll_loss": 1.4994406700134277,
309
+ "rewards/accuracies": 0.659375011920929,
310
+ "rewards/chosen": -0.5664250254631042,
311
+ "rewards/margins": 0.8499553799629211,
312
+ "rewards/rejected": -1.416380524635315,
313
+ "step": 95
314
+ },
315
+ {
316
+ "epoch": 0.8533333333333334,
317
+ "grad_norm": 139.35577392578125,
318
+ "learning_rate": 4.5077867303432546e-05,
319
+ "log_odds_chosen": 1.8051321506500244,
320
+ "log_odds_ratio": -0.5339905619621277,
321
+ "logps/chosen": -1.1410566568374634,
322
+ "logps/rejected": -2.8034322261810303,
323
+ "loss": 55.2954,
324
+ "nll_loss": 1.4609841108322144,
325
+ "rewards/accuracies": 0.706250011920929,
326
+ "rewards/chosen": -0.5705283284187317,
327
+ "rewards/margins": 0.8311878442764282,
328
+ "rewards/rejected": -1.4017161130905151,
329
+ "step": 100
330
+ },
331
+ {
332
+ "epoch": 0.896,
333
+ "grad_norm": 148.62985229492188,
334
+ "learning_rate": 4.431042398061499e-05,
335
+ "log_odds_chosen": 0.8170617818832397,
336
+ "log_odds_ratio": -0.563850998878479,
337
+ "logps/chosen": -1.1079161167144775,
338
+ "logps/rejected": -1.8060640096664429,
339
+ "loss": 54.9982,
340
+ "nll_loss": 1.436767339706421,
341
+ "rewards/accuracies": 0.6968749761581421,
342
+ "rewards/chosen": -0.5539580583572388,
343
+ "rewards/margins": 0.34907400608062744,
344
+ "rewards/rejected": -0.9030320048332214,
345
+ "step": 105
346
+ },
347
+ {
348
+ "epoch": 0.9386666666666666,
349
+ "grad_norm": 89.92095947265625,
350
+ "learning_rate": 4.34949719011896e-05,
351
+ "log_odds_chosen": 1.179985761642456,
352
+ "log_odds_ratio": -0.5284770131111145,
353
+ "logps/chosen": -1.184069037437439,
354
+ "logps/rejected": -2.225637912750244,
355
+ "loss": 55.2995,
356
+ "nll_loss": 1.4638700485229492,
357
+ "rewards/accuracies": 0.703125,
358
+ "rewards/chosen": -0.5920345187187195,
359
+ "rewards/margins": 0.5207844972610474,
360
+ "rewards/rejected": -1.112818956375122,
361
+ "step": 110
362
+ },
363
+ {
364
+ "epoch": 0.9813333333333333,
365
+ "grad_norm": 56.788814544677734,
366
+ "learning_rate": 4.263353840751022e-05,
367
+ "log_odds_chosen": 1.3124678134918213,
368
+ "log_odds_ratio": -0.5542054176330566,
369
+ "logps/chosen": -1.1087802648544312,
370
+ "logps/rejected": -2.28933048248291,
371
+ "loss": 54.7227,
372
+ "nll_loss": 1.4329824447631836,
373
+ "rewards/accuracies": 0.690625011920929,
374
+ "rewards/chosen": -0.5543901324272156,
375
+ "rewards/margins": 0.5902751684188843,
376
+ "rewards/rejected": -1.144665241241455,
377
+ "step": 115
378
+ },
379
+ {
380
+ "epoch": 1.0170666666666666,
381
+ "grad_norm": 114.26390075683594,
382
+ "learning_rate": 4.172826515897146e-05,
383
+ "log_odds_chosen": 1.588960886001587,
384
+ "log_odds_ratio": -0.45346295833587646,
385
+ "logps/chosen": -1.0221998691558838,
386
+ "logps/rejected": -2.3981516361236572,
387
+ "loss": 42.254,
388
+ "nll_loss": 1.3499113321304321,
389
+ "rewards/accuracies": 0.7985074520111084,
390
+ "rewards/chosen": -0.5110999345779419,
391
+ "rewards/margins": 0.6879758238792419,
392
+ "rewards/rejected": -1.1990758180618286,
393
+ "step": 120
394
+ },
395
+ {
396
+ "epoch": 1.0597333333333334,
397
+ "grad_norm": 73.81729888916016,
398
+ "learning_rate": 4.078140280750597e-05,
399
+ "log_odds_chosen": 1.989117980003357,
400
+ "log_odds_ratio": -0.36838608980178833,
401
+ "logps/chosen": -0.9274940490722656,
402
+ "logps/rejected": -2.613682985305786,
403
+ "loss": 45.4124,
404
+ "nll_loss": 1.23494553565979,
405
+ "rewards/accuracies": 0.8687499761581421,
406
+ "rewards/chosen": -0.4637470245361328,
407
+ "rewards/margins": 0.8430943489074707,
408
+ "rewards/rejected": -1.306841492652893,
409
+ "step": 125
410
+ },
411
+ {
412
+ "epoch": 1.1024,
413
+ "grad_norm": 60.55211639404297,
414
+ "learning_rate": 3.9795305402109195e-05,
415
+ "log_odds_chosen": 1.7344251871109009,
416
+ "log_odds_ratio": -0.3585907816886902,
417
+ "logps/chosen": -0.9109547734260559,
418
+ "logps/rejected": -2.3054215908050537,
419
+ "loss": 44.4913,
420
+ "nll_loss": 1.2110589742660522,
421
+ "rewards/accuracies": 0.875,
422
+ "rewards/chosen": -0.45547738671302795,
423
+ "rewards/margins": 0.6972334384918213,
424
+ "rewards/rejected": -1.1527107954025269,
425
+ "step": 130
426
+ },
427
+ {
428
+ "epoch": 1.1450666666666667,
429
+ "grad_norm": 91.24962615966797,
430
+ "learning_rate": 3.8772424536302564e-05,
431
+ "log_odds_chosen": 1.8280813694000244,
432
+ "log_odds_ratio": -0.3626967966556549,
433
+ "logps/chosen": -0.9022882580757141,
434
+ "logps/rejected": -2.3743839263916016,
435
+ "loss": 44.9063,
436
+ "nll_loss": 1.2219722270965576,
437
+ "rewards/accuracies": 0.8374999761581421,
438
+ "rewards/chosen": -0.45114412903785706,
439
+ "rewards/margins": 0.7360478043556213,
440
+ "rewards/rejected": -1.1871919631958008,
441
+ "step": 135
442
+ },
443
+ {
444
+ "epoch": 1.1877333333333333,
445
+ "grad_norm": 121.5604248046875,
446
+ "learning_rate": 3.771530325308579e-05,
447
+ "log_odds_chosen": 2.0554168224334717,
448
+ "log_odds_ratio": -0.3262919783592224,
449
+ "logps/chosen": -0.9136549234390259,
450
+ "logps/rejected": -2.5996718406677246,
451
+ "loss": 44.2475,
452
+ "nll_loss": 1.2195894718170166,
453
+ "rewards/accuracies": 0.909375011920929,
454
+ "rewards/chosen": -0.45682746171951294,
455
+ "rewards/margins": 0.8430085182189941,
456
+ "rewards/rejected": -1.2998359203338623,
457
+ "step": 140
458
+ },
459
+ {
460
+ "epoch": 1.2304,
461
+ "grad_norm": 66.49166107177734,
462
+ "learning_rate": 3.662656972253127e-05,
463
+ "log_odds_chosen": 2.5125927925109863,
464
+ "log_odds_ratio": -0.3072448968887329,
465
+ "logps/chosen": -0.87763911485672,
466
+ "logps/rejected": -2.9942686557769775,
467
+ "loss": 42.6558,
468
+ "nll_loss": 1.1793729066848755,
469
+ "rewards/accuracies": 0.90625,
470
+ "rewards/chosen": -0.43881955742836,
471
+ "rewards/margins": 1.0583146810531616,
472
+ "rewards/rejected": -1.4971343278884888,
473
+ "step": 145
474
+ },
475
+ {
476
+ "epoch": 1.2730666666666668,
477
+ "grad_norm": 83.34021759033203,
478
+ "learning_rate": 3.550893070773914e-05,
479
+ "log_odds_chosen": 2.0552334785461426,
480
+ "log_odds_ratio": -0.3721521496772766,
481
+ "logps/chosen": -0.91778165102005,
482
+ "logps/rejected": -2.6527884006500244,
483
+ "loss": 45.286,
484
+ "nll_loss": 1.2291127443313599,
485
+ "rewards/accuracies": 0.859375,
486
+ "rewards/chosen": -0.458890825510025,
487
+ "rewards/margins": 0.8675033450126648,
488
+ "rewards/rejected": -1.3263942003250122,
489
+ "step": 150
490
+ },
491
+ {
492
+ "epoch": 1.3157333333333332,
493
+ "grad_norm": 190.6498260498047,
494
+ "learning_rate": 3.436516483539781e-05,
495
+ "log_odds_chosen": 2.0241360664367676,
496
+ "log_odds_ratio": -0.3463495075702667,
497
+ "logps/chosen": -0.8888520002365112,
498
+ "logps/rejected": -2.54441237449646,
499
+ "loss": 44.2975,
500
+ "nll_loss": 1.2111207246780396,
501
+ "rewards/accuracies": 0.875,
502
+ "rewards/chosen": -0.4444260001182556,
503
+ "rewards/margins": 0.8277803659439087,
504
+ "rewards/rejected": -1.27220618724823,
505
+ "step": 155
506
+ },
507
+ {
508
+ "epoch": 1.3584,
509
+ "grad_norm": 75.44967651367188,
510
+ "learning_rate": 3.3198115687680115e-05,
511
+ "log_odds_chosen": 1.7034084796905518,
512
+ "log_odds_ratio": -0.37663713097572327,
513
+ "logps/chosen": -0.9074303507804871,
514
+ "logps/rejected": -2.279710054397583,
515
+ "loss": 44.6786,
516
+ "nll_loss": 1.207887887954712,
517
+ "rewards/accuracies": 0.831250011920929,
518
+ "rewards/chosen": -0.45371517539024353,
519
+ "rewards/margins": 0.6861397624015808,
520
+ "rewards/rejected": -1.1398550271987915,
521
+ "step": 160
522
+ },
523
+ {
524
+ "epoch": 1.4010666666666667,
525
+ "grad_norm": 62.095054626464844,
526
+ "learning_rate": 3.201068473265007e-05,
527
+ "log_odds_chosen": 1.9856258630752563,
528
+ "log_odds_ratio": -0.34824666380882263,
529
+ "logps/chosen": -0.8774662017822266,
530
+ "logps/rejected": -2.4961907863616943,
531
+ "loss": 43.6625,
532
+ "nll_loss": 1.19032883644104,
533
+ "rewards/accuracies": 0.8656250238418579,
534
+ "rewards/chosen": -0.4387331008911133,
535
+ "rewards/margins": 0.8093622326850891,
536
+ "rewards/rejected": -1.2480953931808472,
537
+ "step": 165
538
+ },
539
+ {
540
+ "epoch": 1.4437333333333333,
541
+ "grad_norm": 63.6281852722168,
542
+ "learning_rate": 3.0805824110756064e-05,
543
+ "log_odds_chosen": 1.963814377784729,
544
+ "log_odds_ratio": -0.3469398617744446,
545
+ "logps/chosen": -0.9113228917121887,
546
+ "logps/rejected": -2.5117368698120117,
547
+ "loss": 45.0104,
548
+ "nll_loss": 1.2331056594848633,
549
+ "rewards/accuracies": 0.8812500238418579,
550
+ "rewards/chosen": -0.45566144585609436,
551
+ "rewards/margins": 0.8002070188522339,
552
+ "rewards/rejected": -1.2558684349060059,
553
+ "step": 170
554
+ },
555
+ {
556
+ "epoch": 1.4864,
557
+ "grad_norm": 118.99483489990234,
558
+ "learning_rate": 2.958652929534456e-05,
559
+ "log_odds_chosen": 2.362398624420166,
560
+ "log_odds_ratio": -0.32414382696151733,
561
+ "logps/chosen": -0.8880361318588257,
562
+ "logps/rejected": -2.871060609817505,
563
+ "loss": 42.9315,
564
+ "nll_loss": 1.1795381307601929,
565
+ "rewards/accuracies": 0.8999999761581421,
566
+ "rewards/chosen": -0.44401806592941284,
567
+ "rewards/margins": 0.99151211977005,
568
+ "rewards/rejected": -1.4355303049087524,
569
+ "step": 175
570
+ },
571
+ {
572
+ "epoch": 1.5290666666666666,
573
+ "grad_norm": 71.64663696289062,
574
+ "learning_rate": 2.8355831645441388e-05,
575
+ "log_odds_chosen": 2.3435986042022705,
576
+ "log_odds_ratio": -0.3180648684501648,
577
+ "logps/chosen": -0.8936977386474609,
578
+ "logps/rejected": -2.851247787475586,
579
+ "loss": 43.0047,
580
+ "nll_loss": 1.1848657131195068,
581
+ "rewards/accuracies": 0.8812500238418579,
582
+ "rewards/chosen": -0.44684886932373047,
583
+ "rewards/margins": 0.9787752032279968,
584
+ "rewards/rejected": -1.425623893737793,
585
+ "step": 180
586
+ },
587
+ {
588
+ "epoch": 1.5717333333333334,
589
+ "grad_norm": 77.79396057128906,
590
+ "learning_rate": 2.7116790869315582e-05,
591
+ "log_odds_chosen": 2.3846802711486816,
592
+ "log_odds_ratio": -0.32562515139579773,
593
+ "logps/chosen": -0.8554035425186157,
594
+ "logps/rejected": -2.830977439880371,
595
+ "loss": 42.7424,
596
+ "nll_loss": 1.1728862524032593,
597
+ "rewards/accuracies": 0.8968750238418579,
598
+ "rewards/chosen": -0.42770177125930786,
599
+ "rewards/margins": 0.9877870678901672,
600
+ "rewards/rejected": -1.4154887199401855,
601
+ "step": 185
602
+ },
603
+ {
604
+ "epoch": 1.6143999999999998,
605
+ "grad_norm": 65.49602508544922,
606
+ "learning_rate": 2.587248741756253e-05,
607
+ "log_odds_chosen": 2.2837889194488525,
608
+ "log_odds_ratio": -0.3412472903728485,
609
+ "logps/chosen": -0.8937826156616211,
610
+ "logps/rejected": -2.8039238452911377,
611
+ "loss": 43.9291,
612
+ "nll_loss": 1.2021592855453491,
613
+ "rewards/accuracies": 0.878125011920929,
614
+ "rewards/chosen": -0.44689130783081055,
615
+ "rewards/margins": 0.9550706148147583,
616
+ "rewards/rejected": -1.4019619226455688,
617
+ "step": 190
618
+ },
619
+ {
620
+ "epoch": 1.6570666666666667,
621
+ "grad_norm": 66.6220474243164,
622
+ "learning_rate": 2.4626014824618415e-05,
623
+ "log_odds_chosen": 2.197211980819702,
624
+ "log_odds_ratio": -0.3463861346244812,
625
+ "logps/chosen": -0.924345850944519,
626
+ "logps/rejected": -2.7567689418792725,
627
+ "loss": 44.4931,
628
+ "nll_loss": 1.2172157764434814,
629
+ "rewards/accuracies": 0.8687499761581421,
630
+ "rewards/chosen": -0.4621729254722595,
631
+ "rewards/margins": 0.9162116050720215,
632
+ "rewards/rejected": -1.3783844709396362,
633
+ "step": 195
634
+ },
635
+ {
636
+ "epoch": 1.6997333333333333,
637
+ "grad_norm": 74.31687927246094,
638
+ "learning_rate": 2.3380472017746202e-05,
639
+ "log_odds_chosen": 2.50050687789917,
640
+ "log_odds_ratio": -0.33154329657554626,
641
+ "logps/chosen": -0.8792299032211304,
642
+ "logps/rejected": -2.985945224761963,
643
+ "loss": 42.9493,
644
+ "nll_loss": 1.1763942241668701,
645
+ "rewards/accuracies": 0.918749988079071,
646
+ "rewards/chosen": -0.4396149516105652,
647
+ "rewards/margins": 1.0533576011657715,
648
+ "rewards/rejected": -1.4929726123809814,
649
+ "step": 200
650
+ },
651
+ {
652
+ "epoch": 1.7424,
653
+ "grad_norm": 74.58470916748047,
654
+ "learning_rate": 2.2138955612614207e-05,
655
+ "log_odds_chosen": 2.5349690914154053,
656
+ "log_odds_ratio": -0.32573097944259644,
657
+ "logps/chosen": -0.9015717506408691,
658
+ "logps/rejected": -3.0475680828094482,
659
+ "loss": 43.05,
660
+ "nll_loss": 1.1824465990066528,
661
+ "rewards/accuracies": 0.903124988079071,
662
+ "rewards/chosen": -0.45078587532043457,
663
+ "rewards/margins": 1.072998285293579,
664
+ "rewards/rejected": -1.5237840414047241,
665
+ "step": 205
666
+ },
667
+ {
668
+ "epoch": 1.7850666666666668,
669
+ "grad_norm": 58.804725646972656,
670
+ "learning_rate": 2.090455221462156e-05,
671
+ "log_odds_chosen": 2.2193970680236816,
672
+ "log_odds_ratio": -0.3386573791503906,
673
+ "logps/chosen": -0.860241711139679,
674
+ "logps/rejected": -2.6838183403015137,
675
+ "loss": 43.3732,
676
+ "nll_loss": 1.1860827207565308,
677
+ "rewards/accuracies": 0.9125000238418579,
678
+ "rewards/chosen": -0.4301208555698395,
679
+ "rewards/margins": 0.9117883443832397,
680
+ "rewards/rejected": -1.3419091701507568,
681
+ "step": 210
682
+ },
683
+ {
684
+ "epoch": 1.8277333333333332,
685
+ "grad_norm": 78.16554260253906,
686
+ "learning_rate": 1.9680330745110954e-05,
687
+ "log_odds_chosen": 2.9208180904388428,
688
+ "log_odds_ratio": -0.33138272166252136,
689
+ "logps/chosen": -0.8893701434135437,
690
+ "logps/rejected": -3.4184937477111816,
691
+ "loss": 43.589,
692
+ "nll_loss": 1.1964662075042725,
693
+ "rewards/accuracies": 0.8812500238418579,
694
+ "rewards/chosen": -0.44468507170677185,
695
+ "rewards/margins": 1.2645617723464966,
696
+ "rewards/rejected": -1.7092468738555908,
697
+ "step": 215
698
+ },
699
+ {
700
+ "epoch": 1.8704,
701
+ "grad_norm": 59.159664154052734,
702
+ "learning_rate": 1.8469334811546542e-05,
703
+ "log_odds_chosen": 2.7121474742889404,
704
+ "log_odds_ratio": -0.32611986994743347,
705
+ "logps/chosen": -0.8474147915840149,
706
+ "logps/rejected": -3.1522021293640137,
707
+ "loss": 42.4394,
708
+ "nll_loss": 1.1631710529327393,
709
+ "rewards/accuracies": 0.887499988079071,
710
+ "rewards/chosen": -0.42370739579200745,
711
+ "rewards/margins": 1.1523938179016113,
712
+ "rewards/rejected": -1.5761010646820068,
713
+ "step": 220
714
+ },
715
+ {
716
+ "epoch": 1.9130666666666667,
717
+ "grad_norm": 59.09116744995117,
718
+ "learning_rate": 1.7274575140626318e-05,
719
+ "log_odds_chosen": 3.3106465339660645,
720
+ "log_odds_ratio": -0.29852262139320374,
721
+ "logps/chosen": -0.8745032548904419,
722
+ "logps/rejected": -3.783921003341675,
723
+ "loss": 42.0539,
724
+ "nll_loss": 1.1649229526519775,
725
+ "rewards/accuracies": 0.918749988079071,
726
+ "rewards/chosen": -0.43725162744522095,
727
+ "rewards/margins": 1.4547090530395508,
728
+ "rewards/rejected": -1.8919605016708374,
729
+ "step": 225
730
+ },
731
+ {
732
+ "epoch": 1.9557333333333333,
733
+ "grad_norm": 69.94084167480469,
734
+ "learning_rate": 1.609902209314108e-05,
735
+ "log_odds_chosen": 2.8738818168640137,
736
+ "log_odds_ratio": -0.35588911175727844,
737
+ "logps/chosen": -0.9021285772323608,
738
+ "logps/rejected": -3.4104793071746826,
739
+ "loss": 44.5847,
740
+ "nll_loss": 1.215327262878418,
741
+ "rewards/accuracies": 0.859375,
742
+ "rewards/chosen": -0.4510642886161804,
743
+ "rewards/margins": 1.2541753053665161,
744
+ "rewards/rejected": -1.7052396535873413,
745
+ "step": 230
746
+ },
747
+ {
748
+ "epoch": 1.9984,
749
+ "grad_norm": 62.0550537109375,
750
+ "learning_rate": 1.4945598279189565e-05,
751
+ "log_odds_chosen": 2.989150285720825,
752
+ "log_odds_ratio": -0.3047833740711212,
753
+ "logps/chosen": -0.8394752740859985,
754
+ "logps/rejected": -3.399489164352417,
755
+ "loss": 43.0152,
756
+ "nll_loss": 1.1918326616287231,
757
+ "rewards/accuracies": 0.9125000238418579,
758
+ "rewards/chosen": -0.41973763704299927,
759
+ "rewards/margins": 1.280007004737854,
760
+ "rewards/rejected": -1.6997445821762085,
761
+ "step": 235
762
+ },
763
+ {
764
+ "epoch": 2.034133333333333,
765
+ "grad_norm": 54.4438362121582,
766
+ "learning_rate": 1.3817171292109183e-05,
767
+ "log_odds_chosen": 3.9181158542633057,
768
+ "log_odds_ratio": -0.1617850363254547,
769
+ "logps/chosen": -0.6942475438117981,
770
+ "logps/rejected": -3.9301412105560303,
771
+ "loss": 28.5885,
772
+ "nll_loss": 0.9858409762382507,
773
+ "rewards/accuracies": 0.9813432693481445,
774
+ "rewards/chosen": -0.34712377190589905,
775
+ "rewards/margins": 1.6179468631744385,
776
+ "rewards/rejected": -1.9650706052780151,
777
+ "step": 240
778
+ },
779
+ {
780
+ "epoch": 2.0768,
781
+ "grad_norm": 52.67664337158203,
782
+ "learning_rate": 1.271654657918722e-05,
783
+ "log_odds_chosen": 3.7705111503601074,
784
+ "log_odds_ratio": -0.14351816475391388,
785
+ "logps/chosen": -0.6834434866905212,
786
+ "logps/rejected": -3.7164478302001953,
787
+ "loss": 33.6636,
788
+ "nll_loss": 0.9802292585372925,
789
+ "rewards/accuracies": 0.971875011920929,
790
+ "rewards/chosen": -0.3417217433452606,
791
+ "rewards/margins": 1.5165021419525146,
792
+ "rewards/rejected": -1.8582239151000977,
793
+ "step": 245
794
+ },
795
+ {
796
+ "epoch": 2.119466666666667,
797
+ "grad_norm": 51.64983367919922,
798
+ "learning_rate": 1.1646460466876783e-05,
799
+ "log_odds_chosen": 4.401483058929443,
800
+ "log_odds_ratio": -0.1400521695613861,
801
+ "logps/chosen": -0.6629117727279663,
802
+ "logps/rejected": -4.262970447540283,
803
+ "loss": 32.0019,
804
+ "nll_loss": 0.930033802986145,
805
+ "rewards/accuracies": 0.965624988079071,
806
+ "rewards/chosen": -0.33145588636398315,
807
+ "rewards/margins": 1.8000293970108032,
808
+ "rewards/rejected": -2.1314852237701416,
809
+ "step": 250
810
+ },
811
+ {
812
+ "epoch": 2.1621333333333332,
813
+ "grad_norm": 55.190956115722656,
814
+ "learning_rate": 1.0609573357858166e-05,
815
+ "log_odds_chosen": 4.636818885803223,
816
+ "log_odds_ratio": -0.15432265400886536,
817
+ "logps/chosen": -0.6747154593467712,
818
+ "logps/rejected": -4.558712959289551,
819
+ "loss": 33.224,
820
+ "nll_loss": 0.9610891342163086,
821
+ "rewards/accuracies": 0.9624999761581421,
822
+ "rewards/chosen": -0.3373577296733856,
823
+ "rewards/margins": 1.9419982433319092,
824
+ "rewards/rejected": -2.2793564796447754,
825
+ "step": 255
826
+ },
827
+ {
828
+ "epoch": 2.2048,
829
+ "grad_norm": 56.406368255615234,
830
+ "learning_rate": 9.608463116858542e-06,
831
+ "log_odds_chosen": 4.725244998931885,
832
+ "log_odds_ratio": -0.13697457313537598,
833
+ "logps/chosen": -0.6730117797851562,
834
+ "logps/rejected": -4.651577472686768,
835
+ "loss": 33.3052,
836
+ "nll_loss": 0.9723016619682312,
837
+ "rewards/accuracies": 0.981249988079071,
838
+ "rewards/chosen": -0.3365058898925781,
839
+ "rewards/margins": 1.9892826080322266,
840
+ "rewards/rejected": -2.325788736343384,
841
+ "step": 260
842
+ },
843
+ {
844
+ "epoch": 2.2474666666666665,
845
+ "grad_norm": 44.67837905883789,
846
+ "learning_rate": 8.645618661674142e-06,
847
+ "log_odds_chosen": 4.754386901855469,
848
+ "log_odds_ratio": -0.1378304809331894,
849
+ "logps/chosen": -0.6680094003677368,
850
+ "logps/rejected": -4.644643306732178,
851
+ "loss": 32.3483,
852
+ "nll_loss": 0.941969096660614,
853
+ "rewards/accuracies": 0.9781249761581421,
854
+ "rewards/chosen": -0.3340047001838684,
855
+ "rewards/margins": 1.9883171319961548,
856
+ "rewards/rejected": -2.322321653366089,
857
+ "step": 265
858
+ },
859
+ {
860
+ "epoch": 2.2901333333333334,
861
+ "grad_norm": 53.01663589477539,
862
+ "learning_rate": 7.723433775328384e-06,
863
+ "log_odds_chosen": 4.567784309387207,
864
+ "log_odds_ratio": -0.1396690309047699,
865
+ "logps/chosen": -0.6494365930557251,
866
+ "logps/rejected": -4.458431720733643,
867
+ "loss": 32.7593,
868
+ "nll_loss": 0.9538928270339966,
869
+ "rewards/accuracies": 0.984375,
870
+ "rewards/chosen": -0.32471829652786255,
871
+ "rewards/margins": 1.904497742652893,
872
+ "rewards/rejected": -2.2292158603668213,
873
+ "step": 270
874
+ },
875
+ {
876
+ "epoch": 2.3327999999999998,
877
+ "grad_norm": 56.5717887878418,
878
+ "learning_rate": 6.844201154750177e-06,
879
+ "log_odds_chosen": 4.175922870635986,
880
+ "log_odds_ratio": -0.14027449488639832,
881
+ "logps/chosen": -0.627678394317627,
882
+ "logps/rejected": -3.9945285320281982,
883
+ "loss": 31.2248,
884
+ "nll_loss": 0.9056390523910522,
885
+ "rewards/accuracies": 0.981249988079071,
886
+ "rewards/chosen": -0.3138391971588135,
887
+ "rewards/margins": 1.6834253072738647,
888
+ "rewards/rejected": -1.9972642660140991,
889
+ "step": 275
890
+ },
891
+ {
892
+ "epoch": 2.3754666666666666,
893
+ "grad_norm": 64.57848358154297,
894
+ "learning_rate": 6.010106710768052e-06,
895
+ "log_odds_chosen": 4.320036888122559,
896
+ "log_odds_ratio": -0.16020546853542328,
897
+ "logps/chosen": -0.6633978486061096,
898
+ "logps/rejected": -4.205855846405029,
899
+ "loss": 33.4415,
900
+ "nll_loss": 0.9649432897567749,
901
+ "rewards/accuracies": 0.9750000238418579,
902
+ "rewards/chosen": -0.3316989243030548,
903
+ "rewards/margins": 1.7712290287017822,
904
+ "rewards/rejected": -2.1029279232025146,
905
+ "step": 280
906
+ },
907
+ {
908
+ "epoch": 2.4181333333333335,
909
+ "grad_norm": 55.8111457824707,
910
+ "learning_rate": 5.223224133591476e-06,
911
+ "log_odds_chosen": 4.208083152770996,
912
+ "log_odds_ratio": -0.1420014351606369,
913
+ "logps/chosen": -0.6389120221138,
914
+ "logps/rejected": -4.042450904846191,
915
+ "loss": 31.6659,
916
+ "nll_loss": 0.9185575246810913,
917
+ "rewards/accuracies": 0.9750000238418579,
918
+ "rewards/chosen": -0.3194560110569,
919
+ "rewards/margins": 1.701769232749939,
920
+ "rewards/rejected": -2.0212254524230957,
921
+ "step": 285
922
+ },
923
+ {
924
+ "epoch": 2.4608,
925
+ "grad_norm": 65.74859619140625,
926
+ "learning_rate": 4.4855097372902135e-06,
927
+ "log_odds_chosen": 5.008718013763428,
928
+ "log_odds_ratio": -0.13227255642414093,
929
+ "logps/chosen": -0.6442698240280151,
930
+ "logps/rejected": -4.876347064971924,
931
+ "loss": 31.4841,
932
+ "nll_loss": 0.917742133140564,
933
+ "rewards/accuracies": 0.984375,
934
+ "rewards/chosen": -0.32213491201400757,
935
+ "rewards/margins": 2.1160385608673096,
936
+ "rewards/rejected": -2.438173532485962,
937
+ "step": 290
938
+ },
939
+ {
940
+ "epoch": 2.5034666666666667,
941
+ "grad_norm": 64.02488708496094,
942
+ "learning_rate": 3.798797596089351e-06,
943
+ "log_odds_chosen": 5.080009460449219,
944
+ "log_odds_ratio": -0.11570964008569717,
945
+ "logps/chosen": -0.6255289912223816,
946
+ "logps/rejected": -4.892707824707031,
947
+ "loss": 31.6474,
948
+ "nll_loss": 0.9311256408691406,
949
+ "rewards/accuracies": 0.996874988079071,
950
+ "rewards/chosen": -0.3127644956111908,
951
+ "rewards/margins": 2.133589267730713,
952
+ "rewards/rejected": -2.4463539123535156,
953
+ "step": 295
954
+ },
955
+ {
956
+ "epoch": 2.5461333333333336,
957
+ "grad_norm": 57.77256774902344,
958
+ "learning_rate": 3.164794984571759e-06,
959
+ "log_odds_chosen": 4.6938066482543945,
960
+ "log_odds_ratio": -0.13593730330467224,
961
+ "logps/chosen": -0.6526905298233032,
962
+ "logps/rejected": -4.539495468139648,
963
+ "loss": 32.3981,
964
+ "nll_loss": 0.9444714784622192,
965
+ "rewards/accuracies": 0.9781249761581421,
966
+ "rewards/chosen": -0.3263452649116516,
967
+ "rewards/margins": 1.9434025287628174,
968
+ "rewards/rejected": -2.269747734069824,
969
+ "step": 300
970
+ },
971
+ {
972
+ "epoch": 2.5888,
973
+ "grad_norm": 55.01142120361328,
974
+ "learning_rate": 2.58507813312448e-06,
975
+ "log_odds_chosen": 4.846372604370117,
976
+ "log_odds_ratio": -0.13726702332496643,
977
+ "logps/chosen": -0.6522940397262573,
978
+ "logps/rejected": -4.694449424743652,
979
+ "loss": 31.9743,
980
+ "nll_loss": 0.9305631518363953,
981
+ "rewards/accuracies": 0.984375,
982
+ "rewards/chosen": -0.32614701986312866,
983
+ "rewards/margins": 2.0210776329040527,
984
+ "rewards/rejected": -2.347224712371826,
985
+ "step": 305
986
+ },
987
+ {
988
+ "epoch": 2.6314666666666664,
989
+ "grad_norm": 48.492462158203125,
990
+ "learning_rate": 2.0610883091816525e-06,
991
+ "log_odds_chosen": 4.225762367248535,
992
+ "log_odds_ratio": -0.14057612419128418,
993
+ "logps/chosen": -0.6273883581161499,
994
+ "logps/rejected": -4.035782337188721,
995
+ "loss": 32.3499,
996
+ "nll_loss": 0.9406463503837585,
997
+ "rewards/accuracies": 0.984375,
998
+ "rewards/chosen": -0.31369417905807495,
999
+ "rewards/margins": 1.7041969299316406,
1000
+ "rewards/rejected": -2.0178911685943604,
1001
+ "step": 310
1002
+ },
1003
+ {
1004
+ "epoch": 2.6741333333333333,
1005
+ "grad_norm": 62.074928283691406,
1006
+ "learning_rate": 1.59412823400657e-06,
1007
+ "log_odds_chosen": 4.041567802429199,
1008
+ "log_odds_ratio": -0.1489606648683548,
1009
+ "logps/chosen": -0.650128960609436,
1010
+ "logps/rejected": -3.9332008361816406,
1011
+ "loss": 31.8364,
1012
+ "nll_loss": 0.9204059839248657,
1013
+ "rewards/accuracies": 0.9750000238418579,
1014
+ "rewards/chosen": -0.325064480304718,
1015
+ "rewards/margins": 1.641535997390747,
1016
+ "rewards/rejected": -1.9666004180908203,
1017
+ "step": 315
1018
+ },
1019
+ {
1020
+ "epoch": 2.7168,
1021
+ "grad_norm": 53.406063079833984,
1022
+ "learning_rate": 1.1853588439213442e-06,
1023
+ "log_odds_chosen": 4.582950115203857,
1024
+ "log_odds_ratio": -0.14018304646015167,
1025
+ "logps/chosen": -0.6541804075241089,
1026
+ "logps/rejected": -4.454803466796875,
1027
+ "loss": 32.4121,
1028
+ "nll_loss": 0.9427865743637085,
1029
+ "rewards/accuracies": 0.971875011920929,
1030
+ "rewards/chosen": -0.32709020376205444,
1031
+ "rewards/margins": 1.9003114700317383,
1032
+ "rewards/rejected": -2.2274017333984375,
1033
+ "step": 320
1034
+ },
1035
+ {
1036
+ "epoch": 2.7594666666666665,
1037
+ "grad_norm": 70.46380615234375,
1038
+ "learning_rate": 8.357964040363209e-07,
1039
+ "log_odds_chosen": 4.912075996398926,
1040
+ "log_odds_ratio": -0.1292436271905899,
1041
+ "logps/chosen": -0.6370927095413208,
1042
+ "logps/rejected": -4.724200248718262,
1043
+ "loss": 32.779,
1044
+ "nll_loss": 0.9597213864326477,
1045
+ "rewards/accuracies": 0.9781249761581421,
1046
+ "rewards/chosen": -0.3185463547706604,
1047
+ "rewards/margins": 2.0435540676116943,
1048
+ "rewards/rejected": -2.362100124359131,
1049
+ "step": 325
1050
+ },
1051
+ {
1052
+ "epoch": 2.8021333333333334,
1053
+ "grad_norm": 52.883697509765625,
1054
+ "learning_rate": 5.463099816548579e-07,
1055
+ "log_odds_chosen": 5.075955390930176,
1056
+ "log_odds_ratio": -0.12925782799720764,
1057
+ "logps/chosen": -0.6548422574996948,
1058
+ "logps/rejected": -4.904310703277588,
1059
+ "loss": 32.3394,
1060
+ "nll_loss": 0.945977509021759,
1061
+ "rewards/accuracies": 0.981249988079071,
1062
+ "rewards/chosen": -0.3274211287498474,
1063
+ "rewards/margins": 2.1247341632843018,
1064
+ "rewards/rejected": -2.452155351638794,
1065
+ "step": 330
1066
+ },
1067
+ {
1068
+ "epoch": 2.8448,
1069
+ "grad_norm": 53.469276428222656,
1070
+ "learning_rate": 3.1761928563510955e-07,
1071
+ "log_odds_chosen": 4.31265115737915,
1072
+ "log_odds_ratio": -0.15028676390647888,
1073
+ "logps/chosen": -0.6268482804298401,
1074
+ "logps/rejected": -4.122142791748047,
1075
+ "loss": 32.0851,
1076
+ "nll_loss": 0.927514910697937,
1077
+ "rewards/accuracies": 0.96875,
1078
+ "rewards/chosen": -0.31342414021492004,
1079
+ "rewards/margins": 1.7476472854614258,
1080
+ "rewards/rejected": -2.0610713958740234,
1081
+ "step": 335
1082
+ },
1083
+ {
1084
+ "epoch": 2.8874666666666666,
1085
+ "grad_norm": 50.17888259887695,
1086
+ "learning_rate": 1.5029287708036854e-07,
1087
+ "log_odds_chosen": 4.146527290344238,
1088
+ "log_odds_ratio": -0.14773321151733398,
1089
+ "logps/chosen": -0.6559882164001465,
1090
+ "logps/rejected": -4.029378414154053,
1091
+ "loss": 31.9076,
1092
+ "nll_loss": 0.9232465028762817,
1093
+ "rewards/accuracies": 0.981249988079071,
1094
+ "rewards/chosen": -0.32799410820007324,
1095
+ "rewards/margins": 1.6866950988769531,
1096
+ "rewards/rejected": -2.0146892070770264,
1097
+ "step": 340
1098
+ },
1099
+ {
1100
+ "epoch": 2.9301333333333335,
1101
+ "grad_norm": 59.06497573852539,
1102
+ "learning_rate": 4.474675580662113e-08,
1103
+ "log_odds_chosen": 4.623345375061035,
1104
+ "log_odds_ratio": -0.1417851448059082,
1105
+ "logps/chosen": -0.6364270448684692,
1106
+ "logps/rejected": -4.481164932250977,
1107
+ "loss": 31.2961,
1108
+ "nll_loss": 0.9071089625358582,
1109
+ "rewards/accuracies": 0.9781249761581421,
1110
+ "rewards/chosen": -0.3182135224342346,
1111
+ "rewards/margins": 1.9223687648773193,
1112
+ "rewards/rejected": -2.2405824661254883,
1113
+ "step": 345
1114
+ },
1115
+ {
1116
+ "epoch": 2.9728,
1117
+ "grad_norm": 47.288326263427734,
1118
+ "learning_rate": 1.2433261014244136e-09,
1119
+ "log_odds_chosen": 5.221846580505371,
1120
+ "log_odds_ratio": -0.12480974197387695,
1121
+ "logps/chosen": -0.6310793161392212,
1122
+ "logps/rejected": -5.054338455200195,
1123
+ "loss": 30.6507,
1124
+ "nll_loss": 0.8954304456710815,
1125
+ "rewards/accuracies": 0.9750000238418579,
1126
+ "rewards/chosen": -0.3155396580696106,
1127
+ "rewards/margins": 2.211629867553711,
1128
+ "rewards/rejected": -2.5271692276000977,
1129
+ "step": 350
1130
+ },
1131
+ {
1132
+ "epoch": 2.981333333333333,
1133
+ "step": 351,
1134
+ "total_flos": 0.0,
1135
+ "train_loss": 65.2437489175389,
1136
+ "train_runtime": 10349.5719,
1137
+ "train_samples_per_second": 2.174,
1138
+ "train_steps_per_second": 0.034
1139
+ }
1140
+ ],
1141
+ "logging_steps": 5,
1142
+ "max_steps": 351,
1143
+ "num_input_tokens_seen": 0,
1144
+ "num_train_epochs": 3,
1145
+ "save_steps": 100000,
1146
+ "stateful_callbacks": {
1147
+ "TrainerControl": {
1148
+ "args": {
1149
+ "should_epoch_stop": false,
1150
+ "should_evaluate": false,
1151
+ "should_log": false,
1152
+ "should_save": true,
1153
+ "should_training_stop": true
1154
+ },
1155
+ "attributes": {}
1156
+ }
1157
+ },
1158
+ "total_flos": 0.0,
1159
+ "train_batch_size": 1,
1160
+ "trial_name": null,
1161
+ "trial_params": null
1162
+ }