YYT-t commited on
Commit
0ad28ad
·
verified ·
1 Parent(s): 1cd5baf

Model save

Browse files
Files changed (4) hide show
  1. README.md +58 -0
  2. all_results.json +7 -0
  3. train_results.json +7 -0
  4. trainer_state.json +1282 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-7B
3
+ library_name: transformers
4
+ model_name: Qwen2.5-7B-SFT
5
+ tags:
6
+ - generated_from_trainer
7
+ - trl
8
+ - sft
9
+ licence: license
10
+ ---
11
+
12
+ # Model Card for Qwen2.5-7B-SFT
13
+
14
+ This model is a fine-tuned version of [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B).
15
+ It has been trained using [TRL](https://github.com/huggingface/trl).
16
+
17
+ ## Quick start
18
+
19
+ ```python
20
+ from transformers import pipeline
21
+
22
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="YYT-t/Qwen2.5-7B-SFT", device="cuda")
24
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
+ print(output["generated_text"])
26
+ ```
27
+
28
+ ## Training procedure
29
+
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/yifeizuo2029-northwestern-university/BRiTER/runs/ulquktyn)
31
+
32
+
33
+ This model was trained with SFT.
34
+
35
+ ### Framework versions
36
+
37
+ - TRL: 0.15.2
38
+ - Transformers: 4.49.0
39
+ - Pytorch: 2.5.1
40
+ - Datasets: 3.3.2
41
+ - Tokenizers: 0.21.0
42
+
43
+ ## Citations
44
+
45
+
46
+
47
+ Cite TRL as:
48
+
49
+ ```bibtex
50
+ @misc{vonwerra2022trl,
51
+ title = {{TRL: Transformer Reinforcement Learning}},
52
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
53
+ year = 2020,
54
+ journal = {GitHub repository},
55
+ publisher = {GitHub},
56
+ howpublished = {\url{https://github.com/huggingface/trl}}
57
+ }
58
+ ```
all_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "total_flos": 1.846643377789993e+17,
3
+ "train_loss": 0.30200490220900506,
4
+ "train_runtime": 1464.9032,
5
+ "train_samples_per_second": 6.741,
6
+ "train_steps_per_second": 0.106
7
+ }
train_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "total_flos": 1.846643377789993e+17,
3
+ "train_loss": 0.30200490220900506,
4
+ "train_runtime": 1464.9032,
5
+ "train_samples_per_second": 6.741,
6
+ "train_steps_per_second": 0.106
7
+ }
trainer_state.json ADDED
@@ -0,0 +1,1282 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 5.0,
5
+ "eval_steps": 100,
6
+ "global_step": 155,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.03225806451612903,
13
+ "grad_norm": 0.16533233225345612,
14
+ "learning_rate": 1.25e-08,
15
+ "loss": 0.3288,
16
+ "mean_token_accuracy": 0.9142395853996277,
17
+ "step": 1
18
+ },
19
+ {
20
+ "epoch": 0.06451612903225806,
21
+ "grad_norm": 0.16388170421123505,
22
+ "learning_rate": 2.5e-08,
23
+ "loss": 0.3143,
24
+ "mean_token_accuracy": 0.919180154800415,
25
+ "step": 2
26
+ },
27
+ {
28
+ "epoch": 0.0967741935483871,
29
+ "grad_norm": 0.16326533257961273,
30
+ "learning_rate": 3.75e-08,
31
+ "loss": 0.3239,
32
+ "mean_token_accuracy": 0.9184285998344421,
33
+ "step": 3
34
+ },
35
+ {
36
+ "epoch": 0.12903225806451613,
37
+ "grad_norm": 0.14039699733257294,
38
+ "learning_rate": 5e-08,
39
+ "loss": 0.2934,
40
+ "mean_token_accuracy": 0.9237579107284546,
41
+ "step": 4
42
+ },
43
+ {
44
+ "epoch": 0.16129032258064516,
45
+ "grad_norm": 0.17062482237815857,
46
+ "learning_rate": 6.25e-08,
47
+ "loss": 0.3162,
48
+ "mean_token_accuracy": 0.9196513891220093,
49
+ "step": 5
50
+ },
51
+ {
52
+ "epoch": 0.1935483870967742,
53
+ "grad_norm": 0.16192646324634552,
54
+ "learning_rate": 7.5e-08,
55
+ "loss": 0.3403,
56
+ "mean_token_accuracy": 0.9099262356758118,
57
+ "step": 6
58
+ },
59
+ {
60
+ "epoch": 0.22580645161290322,
61
+ "grad_norm": 0.14392420649528503,
62
+ "learning_rate": 8.75e-08,
63
+ "loss": 0.2809,
64
+ "mean_token_accuracy": 0.926060140132904,
65
+ "step": 7
66
+ },
67
+ {
68
+ "epoch": 0.25806451612903225,
69
+ "grad_norm": 0.16454868018627167,
70
+ "learning_rate": 1e-07,
71
+ "loss": 0.3227,
72
+ "mean_token_accuracy": 0.9182868003845215,
73
+ "step": 8
74
+ },
75
+ {
76
+ "epoch": 0.2903225806451613,
77
+ "grad_norm": 0.13043281435966492,
78
+ "learning_rate": 1.125e-07,
79
+ "loss": 0.2912,
80
+ "mean_token_accuracy": 0.9230725765228271,
81
+ "step": 9
82
+ },
83
+ {
84
+ "epoch": 0.3225806451612903,
85
+ "grad_norm": 0.13768798112869263,
86
+ "learning_rate": 1.25e-07,
87
+ "loss": 0.2853,
88
+ "mean_token_accuracy": 0.9282448291778564,
89
+ "step": 10
90
+ },
91
+ {
92
+ "epoch": 0.3548387096774194,
93
+ "grad_norm": 0.1535075604915619,
94
+ "learning_rate": 1.375e-07,
95
+ "loss": 0.289,
96
+ "mean_token_accuracy": 0.9233593940734863,
97
+ "step": 11
98
+ },
99
+ {
100
+ "epoch": 0.3870967741935484,
101
+ "grad_norm": 0.174004465341568,
102
+ "learning_rate": 1.5e-07,
103
+ "loss": 0.3091,
104
+ "mean_token_accuracy": 0.9212210774421692,
105
+ "step": 12
106
+ },
107
+ {
108
+ "epoch": 0.41935483870967744,
109
+ "grad_norm": 1713721835520.0,
110
+ "learning_rate": 1.6249999999999998e-07,
111
+ "loss": 0.2708,
112
+ "mean_token_accuracy": 0.9274526238441467,
113
+ "step": 13
114
+ },
115
+ {
116
+ "epoch": 0.45161290322580644,
117
+ "grad_norm": 0.13813622295856476,
118
+ "learning_rate": 1.75e-07,
119
+ "loss": 0.2719,
120
+ "mean_token_accuracy": 0.9287762641906738,
121
+ "step": 14
122
+ },
123
+ {
124
+ "epoch": 0.4838709677419355,
125
+ "grad_norm": 0.15647657215595245,
126
+ "learning_rate": 1.875e-07,
127
+ "loss": 0.3118,
128
+ "mean_token_accuracy": 0.9167627692222595,
129
+ "step": 15
130
+ },
131
+ {
132
+ "epoch": 0.5161290322580645,
133
+ "grad_norm": 0.1404537558555603,
134
+ "learning_rate": 2e-07,
135
+ "loss": 0.2887,
136
+ "mean_token_accuracy": 0.9297909140586853,
137
+ "step": 16
138
+ },
139
+ {
140
+ "epoch": 0.5483870967741935,
141
+ "grad_norm": 0.16780979931354523,
142
+ "learning_rate": 1.9997445995478116e-07,
143
+ "loss": 0.3192,
144
+ "mean_token_accuracy": 0.9215397834777832,
145
+ "step": 17
146
+ },
147
+ {
148
+ "epoch": 0.5806451612903226,
149
+ "grad_norm": 0.16038931906223297,
150
+ "learning_rate": 1.998978528650029e-07,
151
+ "loss": 0.3057,
152
+ "mean_token_accuracy": 0.9223698377609253,
153
+ "step": 18
154
+ },
155
+ {
156
+ "epoch": 0.6129032258064516,
157
+ "grad_norm": 0.15409325063228607,
158
+ "learning_rate": 1.9977021786163597e-07,
159
+ "loss": 0.2906,
160
+ "mean_token_accuracy": 0.9280111193656921,
161
+ "step": 19
162
+ },
163
+ {
164
+ "epoch": 0.6451612903225806,
165
+ "grad_norm": 0.13709405064582825,
166
+ "learning_rate": 1.995916201407555e-07,
167
+ "loss": 0.2963,
168
+ "mean_token_accuracy": 0.9227046966552734,
169
+ "step": 20
170
+ },
171
+ {
172
+ "epoch": 0.6774193548387096,
173
+ "grad_norm": 0.17100510001182556,
174
+ "learning_rate": 1.9936215093023882e-07,
175
+ "loss": 0.3308,
176
+ "mean_token_accuracy": 0.9156600832939148,
177
+ "step": 21
178
+ },
179
+ {
180
+ "epoch": 0.7096774193548387,
181
+ "grad_norm": 0.15652534365653992,
182
+ "learning_rate": 1.990819274431662e-07,
183
+ "loss": 0.2952,
184
+ "mean_token_accuracy": 0.9242802858352661,
185
+ "step": 22
186
+ },
187
+ {
188
+ "epoch": 0.7419354838709677,
189
+ "grad_norm": 0.14285646378993988,
190
+ "learning_rate": 1.9875109281794824e-07,
191
+ "loss": 0.2787,
192
+ "mean_token_accuracy": 0.9304314851760864,
193
+ "step": 23
194
+ },
195
+ {
196
+ "epoch": 0.7741935483870968,
197
+ "grad_norm": 0.13358458876609802,
198
+ "learning_rate": 1.9836981604521074e-07,
199
+ "loss": 0.2705,
200
+ "mean_token_accuracy": 0.9316097497940063,
201
+ "step": 24
202
+ },
203
+ {
204
+ "epoch": 0.8064516129032258,
205
+ "grad_norm": 0.1293199211359024,
206
+ "learning_rate": 1.9793829188147403e-07,
207
+ "loss": 0.2975,
208
+ "mean_token_accuracy": 0.92156982421875,
209
+ "step": 25
210
+ },
211
+ {
212
+ "epoch": 0.8387096774193549,
213
+ "grad_norm": 0.13538742065429688,
214
+ "learning_rate": 1.9745674074967117e-07,
215
+ "loss": 0.2847,
216
+ "mean_token_accuracy": 0.9282350540161133,
217
+ "step": 26
218
+ },
219
+ {
220
+ "epoch": 0.8709677419354839,
221
+ "grad_norm": 0.146584153175354,
222
+ "learning_rate": 1.9692540862655585e-07,
223
+ "loss": 0.3159,
224
+ "mean_token_accuracy": 0.9188564419746399,
225
+ "step": 27
226
+ },
227
+ {
228
+ "epoch": 0.9032258064516129,
229
+ "grad_norm": 0.15393050014972687,
230
+ "learning_rate": 1.9634456691705702e-07,
231
+ "loss": 0.3065,
232
+ "mean_token_accuracy": 0.9200947284698486,
233
+ "step": 28
234
+ },
235
+ {
236
+ "epoch": 0.9354838709677419,
237
+ "grad_norm": 0.14639101922512054,
238
+ "learning_rate": 1.9571451231564522e-07,
239
+ "loss": 0.2877,
240
+ "mean_token_accuracy": 0.9255397319793701,
241
+ "step": 29
242
+ },
243
+ {
244
+ "epoch": 0.967741935483871,
245
+ "grad_norm": 0.14252065122127533,
246
+ "learning_rate": 1.9503556665478065e-07,
247
+ "loss": 0.3029,
248
+ "mean_token_accuracy": 0.9222911596298218,
249
+ "step": 30
250
+ },
251
+ {
252
+ "epoch": 1.0,
253
+ "grad_norm": 0.1796598732471466,
254
+ "learning_rate": 1.943080767405209e-07,
255
+ "loss": 0.3359,
256
+ "mean_token_accuracy": 0.9132773876190186,
257
+ "step": 31
258
+ },
259
+ {
260
+ "epoch": 1.032258064516129,
261
+ "grad_norm": 0.16374877095222473,
262
+ "learning_rate": 1.9353241417537212e-07,
263
+ "loss": 0.2893,
264
+ "mean_token_accuracy": 0.9236128926277161,
265
+ "step": 32
266
+ },
267
+ {
268
+ "epoch": 1.064516129032258,
269
+ "grad_norm": 0.1340431421995163,
270
+ "learning_rate": 1.9270897516847403e-07,
271
+ "loss": 0.2789,
272
+ "mean_token_accuracy": 0.9263336658477783,
273
+ "step": 33
274
+ },
275
+ {
276
+ "epoch": 1.096774193548387,
277
+ "grad_norm": 0.1781994253396988,
278
+ "learning_rate": 1.918381803332161e-07,
279
+ "loss": 0.3213,
280
+ "mean_token_accuracy": 0.9135346412658691,
281
+ "step": 34
282
+ },
283
+ {
284
+ "epoch": 1.129032258064516,
285
+ "grad_norm": 1900.6429443359375,
286
+ "learning_rate": 1.909204744723877e-07,
287
+ "loss": 0.2741,
288
+ "mean_token_accuracy": 0.9303540587425232,
289
+ "step": 35
290
+ },
291
+ {
292
+ "epoch": 1.1612903225806452,
293
+ "grad_norm": 0.16974130272865295,
294
+ "learning_rate": 1.8995632635097247e-07,
295
+ "loss": 0.3161,
296
+ "mean_token_accuracy": 0.9215410351753235,
297
+ "step": 36
298
+ },
299
+ {
300
+ "epoch": 1.1935483870967742,
301
+ "grad_norm": 0.14828237891197205,
302
+ "learning_rate": 1.889462284567028e-07,
303
+ "loss": 0.3141,
304
+ "mean_token_accuracy": 0.921214759349823,
305
+ "step": 37
306
+ },
307
+ {
308
+ "epoch": 1.2258064516129032,
309
+ "grad_norm": 992012608.0,
310
+ "learning_rate": 1.8789069674849658e-07,
311
+ "loss": 0.2945,
312
+ "mean_token_accuracy": 0.9253477454185486,
313
+ "step": 38
314
+ },
315
+ {
316
+ "epoch": 1.2580645161290323,
317
+ "grad_norm": 0.15507718920707703,
318
+ "learning_rate": 1.8679027039290496e-07,
319
+ "loss": 0.3063,
320
+ "mean_token_accuracy": 0.9237247705459595,
321
+ "step": 39
322
+ },
323
+ {
324
+ "epoch": 1.2903225806451613,
325
+ "grad_norm": 0.118597112596035,
326
+ "learning_rate": 1.856455114887056e-07,
327
+ "loss": 0.2595,
328
+ "mean_token_accuracy": 0.9301245808601379,
329
+ "step": 40
330
+ },
331
+ {
332
+ "epoch": 1.3225806451612903,
333
+ "grad_norm": 0.14978240430355072,
334
+ "learning_rate": 1.8445700477978204e-07,
335
+ "loss": 0.3051,
336
+ "mean_token_accuracy": 0.918502926826477,
337
+ "step": 41
338
+ },
339
+ {
340
+ "epoch": 1.3548387096774195,
341
+ "grad_norm": 0.14542274177074432,
342
+ "learning_rate": 1.8322535735643602e-07,
343
+ "loss": 0.2969,
344
+ "mean_token_accuracy": 0.91986483335495,
345
+ "step": 42
346
+ },
347
+ {
348
+ "epoch": 1.3870967741935485,
349
+ "grad_norm": 0.15233197808265686,
350
+ "learning_rate": 1.8195119834528532e-07,
351
+ "loss": 0.3005,
352
+ "mean_token_accuracy": 0.9208536148071289,
353
+ "step": 43
354
+ },
355
+ {
356
+ "epoch": 1.4193548387096775,
357
+ "grad_norm": 0.1700386255979538,
358
+ "learning_rate": 1.8063517858790515e-07,
359
+ "loss": 0.2974,
360
+ "mean_token_accuracy": 0.9260939359664917,
361
+ "step": 44
362
+ },
363
+ {
364
+ "epoch": 1.4516129032258065,
365
+ "grad_norm": 0.1596897840499878,
366
+ "learning_rate": 1.7927797030837767e-07,
367
+ "loss": 0.3092,
368
+ "mean_token_accuracy": 0.9239941239356995,
369
+ "step": 45
370
+ },
371
+ {
372
+ "epoch": 1.4838709677419355,
373
+ "grad_norm": 0.1388406604528427,
374
+ "learning_rate": 1.778802667699196e-07,
375
+ "loss": 0.2999,
376
+ "mean_token_accuracy": 0.9225243926048279,
377
+ "step": 46
378
+ },
379
+ {
380
+ "epoch": 1.5161290322580645,
381
+ "grad_norm": 591844999168.0,
382
+ "learning_rate": 1.764427819207624e-07,
383
+ "loss": 0.3303,
384
+ "mean_token_accuracy": 0.9121381044387817,
385
+ "step": 47
386
+ },
387
+ {
388
+ "epoch": 1.5483870967741935,
389
+ "grad_norm": 0.14387820661067963,
390
+ "learning_rate": 1.74966250029467e-07,
391
+ "loss": 0.3096,
392
+ "mean_token_accuracy": 0.9206516146659851,
393
+ "step": 48
394
+ },
395
+ {
396
+ "epoch": 1.5806451612903225,
397
+ "grad_norm": 0.1407996267080307,
398
+ "learning_rate": 1.7345142530985886e-07,
399
+ "loss": 0.2979,
400
+ "mean_token_accuracy": 0.9202678203582764,
401
+ "step": 49
402
+ },
403
+ {
404
+ "epoch": 1.6129032258064515,
405
+ "grad_norm": 0.13789159059524536,
406
+ "learning_rate": 1.718990815357747e-07,
407
+ "loss": 0.2723,
408
+ "mean_token_accuracy": 0.9288155436515808,
409
+ "step": 50
410
+ },
411
+ {
412
+ "epoch": 1.6451612903225805,
413
+ "grad_norm": 0.17413267493247986,
414
+ "learning_rate": 1.7031001164581827e-07,
415
+ "loss": 0.3462,
416
+ "mean_token_accuracy": 0.9114100933074951,
417
+ "step": 51
418
+ },
419
+ {
420
+ "epoch": 1.6774193548387095,
421
+ "grad_norm": 0.14843137562274933,
422
+ "learning_rate": 1.6868502733832642e-07,
423
+ "loss": 0.2886,
424
+ "mean_token_accuracy": 0.9235759973526001,
425
+ "step": 52
426
+ },
427
+ {
428
+ "epoch": 1.7096774193548387,
429
+ "grad_norm": 0.16399109363555908,
430
+ "learning_rate": 1.670249586567531e-07,
431
+ "loss": 0.3172,
432
+ "mean_token_accuracy": 0.9193130731582642,
433
+ "step": 53
434
+ },
435
+ {
436
+ "epoch": 1.7419354838709677,
437
+ "grad_norm": 0.14304129779338837,
438
+ "learning_rate": 1.6533065356568206e-07,
439
+ "loss": 0.2986,
440
+ "mean_token_accuracy": 0.9260614514350891,
441
+ "step": 54
442
+ },
443
+ {
444
+ "epoch": 1.7741935483870968,
445
+ "grad_norm": 0.15034537017345428,
446
+ "learning_rate": 1.636029775176862e-07,
447
+ "loss": 0.3142,
448
+ "mean_token_accuracy": 0.920920193195343,
449
+ "step": 55
450
+ },
451
+ {
452
+ "epoch": 1.8064516129032258,
453
+ "grad_norm": 0.13884751498699188,
454
+ "learning_rate": 1.618428130112533e-07,
455
+ "loss": 0.2859,
456
+ "mean_token_accuracy": 0.9274208545684814,
457
+ "step": 56
458
+ },
459
+ {
460
+ "epoch": 1.838709677419355,
461
+ "grad_norm": 0.14844031631946564,
462
+ "learning_rate": 1.6005105914000505e-07,
463
+ "loss": 0.2981,
464
+ "mean_token_accuracy": 0.9255636930465698,
465
+ "step": 57
466
+ },
467
+ {
468
+ "epoch": 1.870967741935484,
469
+ "grad_norm": 0.16273614764213562,
470
+ "learning_rate": 1.5822863113343934e-07,
471
+ "loss": 0.3317,
472
+ "mean_token_accuracy": 0.9183942079544067,
473
+ "step": 58
474
+ },
475
+ {
476
+ "epoch": 1.903225806451613,
477
+ "grad_norm": 2596124672.0,
478
+ "learning_rate": 1.5637645988943006e-07,
479
+ "loss": 0.3263,
480
+ "mean_token_accuracy": 0.9169372320175171,
481
+ "step": 59
482
+ },
483
+ {
484
+ "epoch": 1.935483870967742,
485
+ "grad_norm": 0.15882141888141632,
486
+ "learning_rate": 1.5449549149872375e-07,
487
+ "loss": 0.2856,
488
+ "mean_token_accuracy": 0.9287616610527039,
489
+ "step": 60
490
+ },
491
+ {
492
+ "epoch": 1.967741935483871,
493
+ "grad_norm": 0.13461528718471527,
494
+ "learning_rate": 1.5258668676167547e-07,
495
+ "loss": 0.2996,
496
+ "mean_token_accuracy": 0.9221404790878296,
497
+ "step": 61
498
+ },
499
+ {
500
+ "epoch": 2.0,
501
+ "grad_norm": 0.17273001372814178,
502
+ "learning_rate": 1.5065102069747116e-07,
503
+ "loss": 0.3115,
504
+ "mean_token_accuracy": 0.9187781810760498,
505
+ "step": 62
506
+ },
507
+ {
508
+ "epoch": 2.032258064516129,
509
+ "grad_norm": 0.13375335931777954,
510
+ "learning_rate": 1.4868948204608697e-07,
511
+ "loss": 0.2721,
512
+ "mean_token_accuracy": 0.9304314851760864,
513
+ "step": 63
514
+ },
515
+ {
516
+ "epoch": 2.064516129032258,
517
+ "grad_norm": 0.13471315801143646,
518
+ "learning_rate": 1.4670307276324006e-07,
519
+ "loss": 0.299,
520
+ "mean_token_accuracy": 0.9216856360435486,
521
+ "step": 64
522
+ },
523
+ {
524
+ "epoch": 2.096774193548387,
525
+ "grad_norm": 0.12814444303512573,
526
+ "learning_rate": 1.4469280750858852e-07,
527
+ "loss": 0.296,
528
+ "mean_token_accuracy": 0.9213560819625854,
529
+ "step": 65
530
+ },
531
+ {
532
+ "epoch": 2.129032258064516,
533
+ "grad_norm": 0.14648547768592834,
534
+ "learning_rate": 1.4265971312744249e-07,
535
+ "loss": 0.2833,
536
+ "mean_token_accuracy": 0.9275593757629395,
537
+ "step": 66
538
+ },
539
+ {
540
+ "epoch": 2.161290322580645,
541
+ "grad_norm": 0.1619262844324112,
542
+ "learning_rate": 1.4060482812625054e-07,
543
+ "loss": 0.3123,
544
+ "mean_token_accuracy": 0.9240583181381226,
545
+ "step": 67
546
+ },
547
+ {
548
+ "epoch": 2.193548387096774,
549
+ "grad_norm": 0.15138781070709229,
550
+ "learning_rate": 1.3852920214212964e-07,
551
+ "loss": 0.3084,
552
+ "mean_token_accuracy": 0.9199481010437012,
553
+ "step": 68
554
+ },
555
+ {
556
+ "epoch": 2.225806451612903,
557
+ "grad_norm": 0.18671105802059174,
558
+ "learning_rate": 1.3643389540670962e-07,
559
+ "loss": 0.3394,
560
+ "mean_token_accuracy": 0.9127894639968872,
561
+ "step": 69
562
+ },
563
+ {
564
+ "epoch": 2.258064516129032,
565
+ "grad_norm": 0.12655843794345856,
566
+ "learning_rate": 1.3431997820456592e-07,
567
+ "loss": 0.3004,
568
+ "mean_token_accuracy": 0.9205244779586792,
569
+ "step": 70
570
+ },
571
+ {
572
+ "epoch": 2.2903225806451615,
573
+ "grad_norm": 0.1643962413072586,
574
+ "learning_rate": 1.3218853032651718e-07,
575
+ "loss": 0.2982,
576
+ "mean_token_accuracy": 0.9248165488243103,
577
+ "step": 71
578
+ },
579
+ {
580
+ "epoch": 2.3225806451612905,
581
+ "grad_norm": 0.1438770592212677,
582
+ "learning_rate": 1.300406405180671e-07,
583
+ "loss": 0.2893,
584
+ "mean_token_accuracy": 0.9276708960533142,
585
+ "step": 72
586
+ },
587
+ {
588
+ "epoch": 2.3548387096774195,
589
+ "grad_norm": 13705685.0,
590
+ "learning_rate": 1.278774059232723e-07,
591
+ "loss": 0.2693,
592
+ "mean_token_accuracy": 0.9281597137451172,
593
+ "step": 73
594
+ },
595
+ {
596
+ "epoch": 2.3870967741935485,
597
+ "grad_norm": 0.1379764974117279,
598
+ "learning_rate": 1.2569993152432026e-07,
599
+ "loss": 0.3048,
600
+ "mean_token_accuracy": 0.9185689091682434,
601
+ "step": 74
602
+ },
603
+ {
604
+ "epoch": 2.4193548387096775,
605
+ "grad_norm": 0.1545405089855194,
606
+ "learning_rate": 1.2350932957710321e-07,
607
+ "loss": 0.2992,
608
+ "mean_token_accuracy": 0.921677827835083,
609
+ "step": 75
610
+ },
611
+ {
612
+ "epoch": 2.4516129032258065,
613
+ "grad_norm": 0.16681832075119019,
614
+ "learning_rate": 1.213067190430769e-07,
615
+ "loss": 0.3072,
616
+ "mean_token_accuracy": 0.9232601523399353,
617
+ "step": 76
618
+ },
619
+ {
620
+ "epoch": 2.4838709677419355,
621
+ "grad_norm": 0.16876104474067688,
622
+ "learning_rate": 1.1909322501769406e-07,
623
+ "loss": 0.3232,
624
+ "mean_token_accuracy": 0.9157925844192505,
625
+ "step": 77
626
+ },
627
+ {
628
+ "epoch": 2.5161290322580645,
629
+ "grad_norm": 0.14914129674434662,
630
+ "learning_rate": 1.1686997815570472e-07,
631
+ "loss": 0.3022,
632
+ "mean_token_accuracy": 0.9226891994476318,
633
+ "step": 78
634
+ },
635
+ {
636
+ "epoch": 2.5483870967741935,
637
+ "grad_norm": 0.12800626456737518,
638
+ "learning_rate": 1.1463811409361665e-07,
639
+ "loss": 0.3029,
640
+ "mean_token_accuracy": 0.9234021902084351,
641
+ "step": 79
642
+ },
643
+ {
644
+ "epoch": 2.5806451612903225,
645
+ "grad_norm": 0.16127674281597137,
646
+ "learning_rate": 1.1239877286961121e-07,
647
+ "loss": 0.313,
648
+ "mean_token_accuracy": 0.9184219837188721,
649
+ "step": 80
650
+ },
651
+ {
652
+ "epoch": 2.6129032258064515,
653
+ "grad_norm": 397858209792.0,
654
+ "learning_rate": 1.101530983412108e-07,
655
+ "loss": 0.3366,
656
+ "mean_token_accuracy": 0.9118554592132568,
657
+ "step": 81
658
+ },
659
+ {
660
+ "epoch": 2.6451612903225805,
661
+ "grad_norm": 107614896128.0,
662
+ "learning_rate": 1.0790223760099548e-07,
663
+ "loss": 0.2926,
664
+ "mean_token_accuracy": 0.92240309715271,
665
+ "step": 82
666
+ },
667
+ {
668
+ "epoch": 2.6774193548387095,
669
+ "grad_norm": 0.12381122261285782,
670
+ "learning_rate": 1.0564734039066698e-07,
671
+ "loss": 0.2736,
672
+ "mean_token_accuracy": 0.9273778796195984,
673
+ "step": 83
674
+ },
675
+ {
676
+ "epoch": 2.709677419354839,
677
+ "grad_norm": 0.17293043434619904,
678
+ "learning_rate": 1.0338955851375961e-07,
679
+ "loss": 0.3302,
680
+ "mean_token_accuracy": 0.9148573875427246,
681
+ "step": 84
682
+ },
683
+ {
684
+ "epoch": 2.741935483870968,
685
+ "grad_norm": 318356848640.0,
686
+ "learning_rate": 1.0113004524729798e-07,
687
+ "loss": 0.2887,
688
+ "mean_token_accuracy": 0.9243521094322205,
689
+ "step": 85
690
+ },
691
+ {
692
+ "epoch": 2.774193548387097,
693
+ "grad_norm": 0.13202118873596191,
694
+ "learning_rate": 9.886995475270203e-08,
695
+ "loss": 0.2765,
696
+ "mean_token_accuracy": 0.9281702637672424,
697
+ "step": 86
698
+ },
699
+ {
700
+ "epoch": 2.806451612903226,
701
+ "grad_norm": 0.13786709308624268,
702
+ "learning_rate": 9.661044148624036e-08,
703
+ "loss": 0.2712,
704
+ "mean_token_accuracy": 0.9306640625,
705
+ "step": 87
706
+ },
707
+ {
708
+ "epoch": 2.838709677419355,
709
+ "grad_norm": 0.1563229262828827,
710
+ "learning_rate": 9.435265960933302e-08,
711
+ "loss": 0.2965,
712
+ "mean_token_accuracy": 0.9226992130279541,
713
+ "step": 88
714
+ },
715
+ {
716
+ "epoch": 2.870967741935484,
717
+ "grad_norm": 0.16793645918369293,
718
+ "learning_rate": 9.209776239900452e-08,
719
+ "loss": 0.3205,
720
+ "mean_token_accuracy": 0.9166249632835388,
721
+ "step": 89
722
+ },
723
+ {
724
+ "epoch": 2.903225806451613,
725
+ "grad_norm": 0.19805242121219635,
726
+ "learning_rate": 8.98469016587892e-08,
727
+ "loss": 0.3144,
728
+ "mean_token_accuracy": 0.9230661392211914,
729
+ "step": 90
730
+ },
731
+ {
732
+ "epoch": 2.935483870967742,
733
+ "grad_norm": 0.17880718410015106,
734
+ "learning_rate": 8.76012271303888e-08,
735
+ "loss": 0.3426,
736
+ "mean_token_accuracy": 0.9152123332023621,
737
+ "step": 91
738
+ },
739
+ {
740
+ "epoch": 2.967741935483871,
741
+ "grad_norm": 0.13014861941337585,
742
+ "learning_rate": 8.536188590638333e-08,
743
+ "loss": 0.2849,
744
+ "mean_token_accuracy": 0.9278050065040588,
745
+ "step": 92
746
+ },
747
+ {
748
+ "epoch": 3.0,
749
+ "grad_norm": 134254736.0,
750
+ "learning_rate": 8.313002184429528e-08,
751
+ "loss": 0.3084,
752
+ "mean_token_accuracy": 0.922960638999939,
753
+ "step": 93
754
+ },
755
+ {
756
+ "epoch": 3.032258064516129,
757
+ "grad_norm": 2311489060864.0,
758
+ "learning_rate": 8.090677498230596e-08,
759
+ "loss": 0.3011,
760
+ "mean_token_accuracy": 0.919518232345581,
761
+ "step": 94
762
+ },
763
+ {
764
+ "epoch": 3.064516129032258,
765
+ "grad_norm": 0.1483820080757141,
766
+ "learning_rate": 7.869328095692311e-08,
767
+ "loss": 0.3036,
768
+ "mean_token_accuracy": 0.9224164485931396,
769
+ "step": 95
770
+ },
771
+ {
772
+ "epoch": 3.096774193548387,
773
+ "grad_norm": 0.1287539154291153,
774
+ "learning_rate": 7.64906704228968e-08,
775
+ "loss": 0.3109,
776
+ "mean_token_accuracy": 0.9177184104919434,
777
+ "step": 96
778
+ },
779
+ {
780
+ "epoch": 3.129032258064516,
781
+ "grad_norm": 0.14654560387134552,
782
+ "learning_rate": 7.43000684756797e-08,
783
+ "loss": 0.2991,
784
+ "mean_token_accuracy": 0.9215620756149292,
785
+ "step": 97
786
+ },
787
+ {
788
+ "epoch": 3.161290322580645,
789
+ "grad_norm": 0.15880532562732697,
790
+ "learning_rate": 7.21225940767277e-08,
791
+ "loss": 0.2717,
792
+ "mean_token_accuracy": 0.9311442971229553,
793
+ "step": 98
794
+ },
795
+ {
796
+ "epoch": 3.193548387096774,
797
+ "grad_norm": 0.15332385897636414,
798
+ "learning_rate": 6.995935948193294e-08,
799
+ "loss": 0.2906,
800
+ "mean_token_accuracy": 0.9262657761573792,
801
+ "step": 99
802
+ },
803
+ {
804
+ "epoch": 3.225806451612903,
805
+ "grad_norm": 0.16386571526527405,
806
+ "learning_rate": 6.781146967348282e-08,
807
+ "loss": 0.3087,
808
+ "mean_token_accuracy": 0.9201537370681763,
809
+ "step": 100
810
+ },
811
+ {
812
+ "epoch": 3.258064516129032,
813
+ "grad_norm": 0.1456972062587738,
814
+ "learning_rate": 6.568002179543408e-08,
815
+ "loss": 0.2845,
816
+ "mean_token_accuracy": 0.9251313209533691,
817
+ "step": 101
818
+ },
819
+ {
820
+ "epoch": 3.2903225806451615,
821
+ "grad_norm": 0.1516929566860199,
822
+ "learning_rate": 6.356610459329037e-08,
823
+ "loss": 0.312,
824
+ "mean_token_accuracy": 0.9195959568023682,
825
+ "step": 102
826
+ },
827
+ {
828
+ "epoch": 3.3225806451612905,
829
+ "grad_norm": 5110276608.0,
830
+ "learning_rate": 6.147079785787038e-08,
831
+ "loss": 0.3149,
832
+ "mean_token_accuracy": 0.9239295125007629,
833
+ "step": 103
834
+ },
835
+ {
836
+ "epoch": 3.3548387096774195,
837
+ "grad_norm": 0.14113079011440277,
838
+ "learning_rate": 5.939517187374949e-08,
839
+ "loss": 0.31,
840
+ "mean_token_accuracy": 0.9190698862075806,
841
+ "step": 104
842
+ },
843
+ {
844
+ "epoch": 3.3870967741935485,
845
+ "grad_norm": 0.15635834634304047,
846
+ "learning_rate": 5.7340286872557505e-08,
847
+ "loss": 0.3075,
848
+ "mean_token_accuracy": 0.9230503439903259,
849
+ "step": 105
850
+ },
851
+ {
852
+ "epoch": 3.4193548387096775,
853
+ "grad_norm": 0.15426763892173767,
854
+ "learning_rate": 5.530719249141147e-08,
855
+ "loss": 0.3232,
856
+ "mean_token_accuracy": 0.919147253036499,
857
+ "step": 106
858
+ },
859
+ {
860
+ "epoch": 3.4516129032258065,
861
+ "grad_norm": 0.1907190978527069,
862
+ "learning_rate": 5.3296927236759934e-08,
863
+ "loss": 0.3244,
864
+ "mean_token_accuracy": 0.914537787437439,
865
+ "step": 107
866
+ },
867
+ {
868
+ "epoch": 3.4838709677419355,
869
+ "grad_norm": 0.14416781067848206,
870
+ "learning_rate": 5.131051795391301e-08,
871
+ "loss": 0.3002,
872
+ "mean_token_accuracy": 0.924168050289154,
873
+ "step": 108
874
+ },
875
+ {
876
+ "epoch": 3.5161290322580645,
877
+ "grad_norm": 0.16144634783267975,
878
+ "learning_rate": 4.934897930252886e-08,
879
+ "loss": 0.3025,
880
+ "mean_token_accuracy": 0.9218240976333618,
881
+ "step": 109
882
+ },
883
+ {
884
+ "epoch": 3.5483870967741935,
885
+ "grad_norm": 0.15450312197208405,
886
+ "learning_rate": 4.741331323832455e-08,
887
+ "loss": 0.2909,
888
+ "mean_token_accuracy": 0.924649178981781,
889
+ "step": 110
890
+ },
891
+ {
892
+ "epoch": 3.5806451612903225,
893
+ "grad_norm": 0.12900131940841675,
894
+ "learning_rate": 4.5504508501276254e-08,
895
+ "loss": 0.2798,
896
+ "mean_token_accuracy": 0.9257345795631409,
897
+ "step": 111
898
+ },
899
+ {
900
+ "epoch": 3.6129032258064515,
901
+ "grad_norm": 0.14584125578403473,
902
+ "learning_rate": 4.3623540110569934e-08,
903
+ "loss": 0.308,
904
+ "mean_token_accuracy": 0.9198503494262695,
905
+ "step": 112
906
+ },
907
+ {
908
+ "epoch": 3.6451612903225805,
909
+ "grad_norm": 0.13434411585330963,
910
+ "learning_rate": 4.1771368866560665e-08,
911
+ "loss": 0.2937,
912
+ "mean_token_accuracy": 0.9217662215232849,
913
+ "step": 113
914
+ },
915
+ {
916
+ "epoch": 3.6774193548387095,
917
+ "grad_norm": 0.15536189079284668,
918
+ "learning_rate": 3.9948940859994963e-08,
919
+ "loss": 0.3124,
920
+ "mean_token_accuracy": 0.9192003011703491,
921
+ "step": 114
922
+ },
923
+ {
924
+ "epoch": 3.709677419354839,
925
+ "grad_norm": 0.16448096930980682,
926
+ "learning_rate": 3.8157186988746716e-08,
927
+ "loss": 0.2935,
928
+ "mean_token_accuracy": 0.927962064743042,
929
+ "step": 115
930
+ },
931
+ {
932
+ "epoch": 3.741935483870968,
933
+ "grad_norm": 0.18926770985126495,
934
+ "learning_rate": 3.63970224823138e-08,
935
+ "loss": 0.3329,
936
+ "mean_token_accuracy": 0.9149773716926575,
937
+ "step": 116
938
+ },
939
+ {
940
+ "epoch": 3.774193548387097,
941
+ "grad_norm": 0.14664557576179504,
942
+ "learning_rate": 3.4669346434317946e-08,
943
+ "loss": 0.2902,
944
+ "mean_token_accuracy": 0.9280288219451904,
945
+ "step": 117
946
+ },
947
+ {
948
+ "epoch": 3.806451612903226,
949
+ "grad_norm": 0.17194537818431854,
950
+ "learning_rate": 3.297504134324693e-08,
951
+ "loss": 0.306,
952
+ "mean_token_accuracy": 0.922054648399353,
953
+ "step": 118
954
+ },
955
+ {
956
+ "epoch": 3.838709677419355,
957
+ "grad_norm": 0.13546685874462128,
958
+ "learning_rate": 3.131497266167357e-08,
959
+ "loss": 0.2662,
960
+ "mean_token_accuracy": 0.933337390422821,
961
+ "step": 119
962
+ },
963
+ {
964
+ "epoch": 3.870967741935484,
965
+ "grad_norm": 0.15774165093898773,
966
+ "learning_rate": 2.9689988354181737e-08,
967
+ "loss": 0.3042,
968
+ "mean_token_accuracy": 0.9215238094329834,
969
+ "step": 120
970
+ },
971
+ {
972
+ "epoch": 3.903225806451613,
973
+ "grad_norm": 11272906080256.0,
974
+ "learning_rate": 2.81009184642253e-08,
975
+ "loss": 0.2931,
976
+ "mean_token_accuracy": 0.9249425530433655,
977
+ "step": 121
978
+ },
979
+ {
980
+ "epoch": 3.935483870967742,
981
+ "grad_norm": 0.13712410628795624,
982
+ "learning_rate": 2.6548574690141122e-08,
983
+ "loss": 0.291,
984
+ "mean_token_accuracy": 0.9214818477630615,
985
+ "step": 122
986
+ },
987
+ {
988
+ "epoch": 3.967741935483871,
989
+ "grad_norm": 0.1492443084716797,
990
+ "learning_rate": 2.5033749970533015e-08,
991
+ "loss": 0.3333,
992
+ "mean_token_accuracy": 0.91297447681427,
993
+ "step": 123
994
+ },
995
+ {
996
+ "epoch": 4.0,
997
+ "grad_norm": 0.14738278090953827,
998
+ "learning_rate": 2.3557218079237607e-08,
999
+ "loss": 0.3008,
1000
+ "mean_token_accuracy": 0.9269067049026489,
1001
+ "step": 124
1002
+ },
1003
+ {
1004
+ "epoch": 4.032258064516129,
1005
+ "grad_norm": 0.14828868210315704,
1006
+ "learning_rate": 2.2119733230080406e-08,
1007
+ "loss": 0.3025,
1008
+ "mean_token_accuracy": 0.918917179107666,
1009
+ "step": 125
1010
+ },
1011
+ {
1012
+ "epoch": 4.064516129032258,
1013
+ "grad_norm": 0.129131019115448,
1014
+ "learning_rate": 2.0722029691622334e-08,
1015
+ "loss": 0.2981,
1016
+ "mean_token_accuracy": 0.9248138666152954,
1017
+ "step": 126
1018
+ },
1019
+ {
1020
+ "epoch": 4.096774193548387,
1021
+ "grad_norm": 0.16759882867336273,
1022
+ "learning_rate": 1.9364821412094857e-08,
1023
+ "loss": 0.3119,
1024
+ "mean_token_accuracy": 0.9213570952415466,
1025
+ "step": 127
1026
+ },
1027
+ {
1028
+ "epoch": 4.129032258064516,
1029
+ "grad_norm": 0.16001583635807037,
1030
+ "learning_rate": 1.8048801654714683e-08,
1031
+ "loss": 0.3155,
1032
+ "mean_token_accuracy": 0.920055627822876,
1033
+ "step": 128
1034
+ },
1035
+ {
1036
+ "epoch": 4.161290322580645,
1037
+ "grad_norm": 0.14847075939178467,
1038
+ "learning_rate": 1.677464264356395e-08,
1039
+ "loss": 0.3099,
1040
+ "mean_token_accuracy": 0.9192079305648804,
1041
+ "step": 129
1042
+ },
1043
+ {
1044
+ "epoch": 4.193548387096774,
1045
+ "grad_norm": 0.16150416433811188,
1046
+ "learning_rate": 1.554299522021796e-08,
1047
+ "loss": 0.3107,
1048
+ "mean_token_accuracy": 0.9221417903900146,
1049
+ "step": 130
1050
+ },
1051
+ {
1052
+ "epoch": 4.225806451612903,
1053
+ "grad_norm": 0.159551203250885,
1054
+ "learning_rate": 1.4354488511294416e-08,
1055
+ "loss": 0.3024,
1056
+ "mean_token_accuracy": 0.9230970144271851,
1057
+ "step": 131
1058
+ },
1059
+ {
1060
+ "epoch": 4.258064516129032,
1061
+ "grad_norm": 0.13861845433712006,
1062
+ "learning_rate": 1.3209729607095021e-08,
1063
+ "loss": 0.2794,
1064
+ "mean_token_accuracy": 0.9251081347465515,
1065
+ "step": 132
1066
+ },
1067
+ {
1068
+ "epoch": 4.290322580645161,
1069
+ "grad_norm": 0.1300177425146103,
1070
+ "learning_rate": 1.2109303251503433e-08,
1071
+ "loss": 0.2767,
1072
+ "mean_token_accuracy": 0.9269058108329773,
1073
+ "step": 133
1074
+ },
1075
+ {
1076
+ "epoch": 4.32258064516129,
1077
+ "grad_norm": 0.15271711349487305,
1078
+ "learning_rate": 1.1053771543297197e-08,
1079
+ "loss": 0.2833,
1080
+ "mean_token_accuracy": 0.9278181791305542,
1081
+ "step": 134
1082
+ },
1083
+ {
1084
+ "epoch": 4.354838709677419,
1085
+ "grad_norm": 0.15096405148506165,
1086
+ "learning_rate": 1.0043673649027518e-08,
1087
+ "loss": 0.3032,
1088
+ "mean_token_accuracy": 0.9209933876991272,
1089
+ "step": 135
1090
+ },
1091
+ {
1092
+ "epoch": 4.387096774193548,
1093
+ "grad_norm": 0.1546061486005783,
1094
+ "learning_rate": 9.07952552761232e-09,
1095
+ "loss": 0.2895,
1096
+ "mean_token_accuracy": 0.923534095287323,
1097
+ "step": 136
1098
+ },
1099
+ {
1100
+ "epoch": 4.419354838709677,
1101
+ "grad_norm": 0.15047194063663483,
1102
+ "learning_rate": 8.161819666783887e-09,
1103
+ "loss": 0.302,
1104
+ "mean_token_accuracy": 0.9207534790039062,
1105
+ "step": 137
1106
+ },
1107
+ {
1108
+ "epoch": 4.451612903225806,
1109
+ "grad_norm": 0.13835148513317108,
1110
+ "learning_rate": 7.29102483152596e-09,
1111
+ "loss": 0.2829,
1112
+ "mean_token_accuracy": 0.9299312829971313,
1113
+ "step": 138
1114
+ },
1115
+ {
1116
+ "epoch": 4.483870967741936,
1117
+ "grad_norm": 0.14567099511623383,
1118
+ "learning_rate": 6.467585824627886e-09,
1119
+ "loss": 0.3009,
1120
+ "mean_token_accuracy": 0.92234206199646,
1121
+ "step": 139
1122
+ },
1123
+ {
1124
+ "epoch": 4.516129032258064,
1125
+ "grad_norm": 0.14242888987064362,
1126
+ "learning_rate": 5.691923259479092e-09,
1127
+ "loss": 0.2946,
1128
+ "mean_token_accuracy": 0.9222550988197327,
1129
+ "step": 140
1130
+ },
1131
+ {
1132
+ "epoch": 4.548387096774194,
1133
+ "grad_norm": 0.14662319421768188,
1134
+ "learning_rate": 4.964433345219354e-09,
1135
+ "loss": 0.3089,
1136
+ "mean_token_accuracy": 0.9191495180130005,
1137
+ "step": 141
1138
+ },
1139
+ {
1140
+ "epoch": 4.580645161290323,
1141
+ "grad_norm": 0.16040538251399994,
1142
+ "learning_rate": 4.285487684354771e-09,
1143
+ "loss": 0.3086,
1144
+ "mean_token_accuracy": 0.9262273907661438,
1145
+ "step": 142
1146
+ },
1147
+ {
1148
+ "epoch": 4.612903225806452,
1149
+ "grad_norm": 0.17762604355812073,
1150
+ "learning_rate": 3.6554330829429714e-09,
1151
+ "loss": 0.3128,
1152
+ "mean_token_accuracy": 0.9169263243675232,
1153
+ "step": 143
1154
+ },
1155
+ {
1156
+ "epoch": 4.645161290322581,
1157
+ "grad_norm": 0.15534666180610657,
1158
+ "learning_rate": 3.074591373444135e-09,
1159
+ "loss": 0.3014,
1160
+ "mean_token_accuracy": 0.9234717488288879,
1161
+ "step": 144
1162
+ },
1163
+ {
1164
+ "epoch": 4.67741935483871,
1165
+ "grad_norm": 0.13566385209560394,
1166
+ "learning_rate": 2.5432592503287997e-09,
1167
+ "loss": 0.2719,
1168
+ "mean_token_accuracy": 0.9335603713989258,
1169
+ "step": 145
1170
+ },
1171
+ {
1172
+ "epoch": 4.709677419354839,
1173
+ "grad_norm": 0.1675826907157898,
1174
+ "learning_rate": 2.061708118525951e-09,
1175
+ "loss": 0.3175,
1176
+ "mean_token_accuracy": 0.925308108329773,
1177
+ "step": 146
1178
+ },
1179
+ {
1180
+ "epoch": 4.741935483870968,
1181
+ "grad_norm": 0.1770351529121399,
1182
+ "learning_rate": 1.6301839547892327e-09,
1183
+ "loss": 0.3083,
1184
+ "mean_token_accuracy": 0.9258779883384705,
1185
+ "step": 147
1186
+ },
1187
+ {
1188
+ "epoch": 4.774193548387097,
1189
+ "grad_norm": 0.1430787742137909,
1190
+ "learning_rate": 1.2489071820517394e-09,
1191
+ "loss": 0.2772,
1192
+ "mean_token_accuracy": 0.9270695447921753,
1193
+ "step": 148
1194
+ },
1195
+ {
1196
+ "epoch": 4.806451612903226,
1197
+ "grad_norm": 0.1444924771785736,
1198
+ "learning_rate": 9.180725568338043e-10,
1199
+ "loss": 0.3102,
1200
+ "mean_token_accuracy": 0.9180653691291809,
1201
+ "step": 149
1202
+ },
1203
+ {
1204
+ "epoch": 4.838709677419355,
1205
+ "grad_norm": 0.15837052464485168,
1206
+ "learning_rate": 6.37849069761176e-10,
1207
+ "loss": 0.328,
1208
+ "mean_token_accuracy": 0.9148820638656616,
1209
+ "step": 150
1210
+ },
1211
+ {
1212
+ "epoch": 4.870967741935484,
1213
+ "grad_norm": 0.14807510375976562,
1214
+ "learning_rate": 4.083798592444898e-10,
1215
+ "loss": 0.3087,
1216
+ "mean_token_accuracy": 0.9184255003929138,
1217
+ "step": 151
1218
+ },
1219
+ {
1220
+ "epoch": 4.903225806451613,
1221
+ "grad_norm": 0.15544399619102478,
1222
+ "learning_rate": 2.2978213836400973e-10,
1223
+ "loss": 0.2967,
1224
+ "mean_token_accuracy": 0.9193904995918274,
1225
+ "step": 152
1226
+ },
1227
+ {
1228
+ "epoch": 4.935483870967742,
1229
+ "grad_norm": 0.18428154289722443,
1230
+ "learning_rate": 1.0214713499706595e-10,
1231
+ "loss": 0.3394,
1232
+ "mean_token_accuracy": 0.9154376983642578,
1233
+ "step": 153
1234
+ },
1235
+ {
1236
+ "epoch": 4.967741935483871,
1237
+ "grad_norm": 0.13660480082035065,
1238
+ "learning_rate": 2.554004521881925e-11,
1239
+ "loss": 0.2831,
1240
+ "mean_token_accuracy": 0.9276332855224609,
1241
+ "step": 154
1242
+ },
1243
+ {
1244
+ "epoch": 5.0,
1245
+ "grad_norm": 420961255424.0,
1246
+ "learning_rate": 0.0,
1247
+ "loss": 0.3237,
1248
+ "mean_token_accuracy": 0.9185903668403625,
1249
+ "step": 155
1250
+ },
1251
+ {
1252
+ "epoch": 5.0,
1253
+ "step": 155,
1254
+ "total_flos": 1.846643377789993e+17,
1255
+ "train_loss": 0.30200490220900506,
1256
+ "train_runtime": 1464.9032,
1257
+ "train_samples_per_second": 6.741,
1258
+ "train_steps_per_second": 0.106
1259
+ }
1260
+ ],
1261
+ "logging_steps": 1,
1262
+ "max_steps": 155,
1263
+ "num_input_tokens_seen": 0,
1264
+ "num_train_epochs": 5,
1265
+ "save_steps": 1,
1266
+ "stateful_callbacks": {
1267
+ "TrainerControl": {
1268
+ "args": {
1269
+ "should_epoch_stop": false,
1270
+ "should_evaluate": false,
1271
+ "should_log": false,
1272
+ "should_save": true,
1273
+ "should_training_stop": true
1274
+ },
1275
+ "attributes": {}
1276
+ }
1277
+ },
1278
+ "total_flos": 1.846643377789993e+17,
1279
+ "train_batch_size": 8,
1280
+ "trial_name": null,
1281
+ "trial_params": null
1282
+ }