utakumi commited on
Commit
62c9221
·
verified ·
1 Parent(s): 5acb45d

End of training

Browse files
Files changed (5) hide show
  1. README.md +6 -3
  2. all_results.json +16 -0
  3. eval_results.json +10 -0
  4. train_results.json +9 -0
  5. trainer_state.json +932 -0
README.md CHANGED
@@ -1,7 +1,10 @@
1
  ---
2
  library_name: transformers
 
3
  base_model: rinna/japanese-hubert-base
4
  tags:
 
 
5
  - generated_from_trainer
6
  metrics:
7
  - wer
@@ -15,11 +18,11 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # Hubert-kakeiken-W-closed_add_ver2
17
 
18
- This model is a fine-tuned version of [rinna/japanese-hubert-base](https://huggingface.co/rinna/japanese-hubert-base) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 0.0581
21
  - Wer: 0.9988
22
- - Cer: 1.0133
23
 
24
  ## Model description
25
 
 
1
  ---
2
  library_name: transformers
3
+ license: apache-2.0
4
  base_model: rinna/japanese-hubert-base
5
  tags:
6
+ - automatic-speech-recognition
7
+ - original_kakeiken_W_closed_add_ver2
8
  - generated_from_trainer
9
  metrics:
10
  - wer
 
18
 
19
  # Hubert-kakeiken-W-closed_add_ver2
20
 
21
+ This model is a fine-tuned version of [rinna/japanese-hubert-base](https://huggingface.co/rinna/japanese-hubert-base) on the ORIGINAL_KAKEIKEN_W_CLOSED_ADD_VER2 - JA dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 0.0617
24
  - Wer: 0.9988
25
+ - Cer: 1.0129
26
 
27
  ## Model description
28
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 39.955088118249,
3
+ "eval_cer": 1.0129208155523945,
4
+ "eval_loss": 0.061716873198747635,
5
+ "eval_runtime": 59.3704,
6
+ "eval_samples": 6840,
7
+ "eval_samples_per_second": 115.209,
8
+ "eval_steps_per_second": 14.401,
9
+ "eval_wer": 0.9988304093567252,
10
+ "total_flos": 1.8269796433195942e+19,
11
+ "train_loss": 1.1798896302406563,
12
+ "train_runtime": 28544.574,
13
+ "train_samples": 56280,
14
+ "train_samples_per_second": 78.866,
15
+ "train_steps_per_second": 1.232
16
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 39.955088118249,
3
+ "eval_cer": 1.0129208155523945,
4
+ "eval_loss": 0.061716873198747635,
5
+ "eval_runtime": 59.3704,
6
+ "eval_samples": 6840,
7
+ "eval_samples_per_second": 115.209,
8
+ "eval_steps_per_second": 14.401,
9
+ "eval_wer": 0.9988304093567252
10
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 39.955088118249,
3
+ "total_flos": 1.8269796433195942e+19,
4
+ "train_loss": 1.1798896302406563,
5
+ "train_runtime": 28544.574,
6
+ "train_samples": 56280,
7
+ "train_samples_per_second": 78.866,
8
+ "train_steps_per_second": 1.232
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,932 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 39.955088118249,
5
+ "eval_steps": 100.0,
6
+ "global_step": 35160,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.5685048322910745,
13
+ "grad_norm": 58.098880767822266,
14
+ "learning_rate": 1.1904e-06,
15
+ "loss": 28.4059,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 1.0,
20
+ "eval_cer": 1.1284080132764343,
21
+ "eval_loss": 10.672073364257812,
22
+ "eval_runtime": 62.3734,
23
+ "eval_samples_per_second": 109.662,
24
+ "eval_steps_per_second": 13.708,
25
+ "eval_wer": 1.0,
26
+ "step": 880
27
+ },
28
+ {
29
+ "epoch": 1.1364411597498578,
30
+ "grad_norm": 39.93675231933594,
31
+ "learning_rate": 2.3880000000000003e-06,
32
+ "loss": 11.3029,
33
+ "step": 1000
34
+ },
35
+ {
36
+ "epoch": 1.7049459920409324,
37
+ "grad_norm": 36.53426742553711,
38
+ "learning_rate": 3.588e-06,
39
+ "loss": 9.1792,
40
+ "step": 1500
41
+ },
42
+ {
43
+ "epoch": 2.0,
44
+ "eval_cer": 1.1283783783783783,
45
+ "eval_loss": 6.992434978485107,
46
+ "eval_runtime": 59.9626,
47
+ "eval_samples_per_second": 114.071,
48
+ "eval_steps_per_second": 14.259,
49
+ "eval_wer": 1.0,
50
+ "step": 1760
51
+ },
52
+ {
53
+ "epoch": 2.2728823194997156,
54
+ "grad_norm": 24.559185028076172,
55
+ "learning_rate": 4.788e-06,
56
+ "loss": 7.001,
57
+ "step": 2000
58
+ },
59
+ {
60
+ "epoch": 2.8413871517907903,
61
+ "grad_norm": 10.105766296386719,
62
+ "learning_rate": 5.988e-06,
63
+ "loss": 4.9143,
64
+ "step": 2500
65
+ },
66
+ {
67
+ "epoch": 3.0,
68
+ "eval_cer": 1.1283783783783783,
69
+ "eval_loss": 3.8166255950927734,
70
+ "eval_runtime": 59.1681,
71
+ "eval_samples_per_second": 115.603,
72
+ "eval_steps_per_second": 14.45,
73
+ "eval_wer": 1.0,
74
+ "step": 2640
75
+ },
76
+ {
77
+ "epoch": 3.4093234792495735,
78
+ "grad_norm": 1.8796437978744507,
79
+ "learning_rate": 7.1880000000000005e-06,
80
+ "loss": 3.6813,
81
+ "step": 3000
82
+ },
83
+ {
84
+ "epoch": 3.977828311540648,
85
+ "grad_norm": 2.5441930294036865,
86
+ "learning_rate": 8.388e-06,
87
+ "loss": 3.1394,
88
+ "step": 3500
89
+ },
90
+ {
91
+ "epoch": 4.0,
92
+ "eval_cer": 1.1283191085822666,
93
+ "eval_loss": 2.882925033569336,
94
+ "eval_runtime": 60.4525,
95
+ "eval_samples_per_second": 113.147,
96
+ "eval_steps_per_second": 14.143,
97
+ "eval_wer": 1.0,
98
+ "step": 3520
99
+ },
100
+ {
101
+ "epoch": 4.545764638999431,
102
+ "grad_norm": 1.8130204677581787,
103
+ "learning_rate": 9.588e-06,
104
+ "loss": 2.7266,
105
+ "step": 4000
106
+ },
107
+ {
108
+ "epoch": 5.0,
109
+ "eval_cer": 1.1444108582266477,
110
+ "eval_loss": 1.9608492851257324,
111
+ "eval_runtime": 61.7902,
112
+ "eval_samples_per_second": 110.697,
113
+ "eval_steps_per_second": 13.837,
114
+ "eval_wer": 1.0,
115
+ "step": 4400
116
+ },
117
+ {
118
+ "epoch": 5.1137009664582145,
119
+ "grad_norm": 5.103806972503662,
120
+ "learning_rate": 1.0787999999999999e-05,
121
+ "loss": 2.1714,
122
+ "step": 4500
123
+ },
124
+ {
125
+ "epoch": 5.6822057987492895,
126
+ "grad_norm": 6.647071838378906,
127
+ "learning_rate": 1.1988000000000001e-05,
128
+ "loss": 1.4314,
129
+ "step": 5000
130
+ },
131
+ {
132
+ "epoch": 6.0,
133
+ "eval_cer": 1.0662043622569939,
134
+ "eval_loss": 0.8433689475059509,
135
+ "eval_runtime": 62.0284,
136
+ "eval_samples_per_second": 110.272,
137
+ "eval_steps_per_second": 13.784,
138
+ "eval_wer": 0.9998538011695907,
139
+ "step": 5280
140
+ },
141
+ {
142
+ "epoch": 6.250142126208073,
143
+ "grad_norm": 5.186183929443359,
144
+ "learning_rate": 1.3188e-05,
145
+ "loss": 0.9139,
146
+ "step": 5500
147
+ },
148
+ {
149
+ "epoch": 6.818646958499147,
150
+ "grad_norm": 5.24096155166626,
151
+ "learning_rate": 1.4388000000000002e-05,
152
+ "loss": 0.6837,
153
+ "step": 6000
154
+ },
155
+ {
156
+ "epoch": 7.0,
157
+ "eval_cer": 1.0329540066382172,
158
+ "eval_loss": 0.4582904875278473,
159
+ "eval_runtime": 60.9484,
160
+ "eval_samples_per_second": 112.226,
161
+ "eval_steps_per_second": 14.028,
162
+ "eval_wer": 0.9997076023391813,
163
+ "step": 6160
164
+ },
165
+ {
166
+ "epoch": 7.386583285957931,
167
+ "grad_norm": 5.6131510734558105,
168
+ "learning_rate": 1.5588e-05,
169
+ "loss": 0.5497,
170
+ "step": 6500
171
+ },
172
+ {
173
+ "epoch": 7.955088118249005,
174
+ "grad_norm": 7.207092761993408,
175
+ "learning_rate": 1.6788e-05,
176
+ "loss": 0.403,
177
+ "step": 7000
178
+ },
179
+ {
180
+ "epoch": 8.0,
181
+ "eval_cer": 1.0478603603603605,
182
+ "eval_loss": 0.25122731924057007,
183
+ "eval_runtime": 62.9796,
184
+ "eval_samples_per_second": 108.607,
185
+ "eval_steps_per_second": 13.576,
186
+ "eval_wer": 0.9991228070175439,
187
+ "step": 7040
188
+ },
189
+ {
190
+ "epoch": 8.523024445707788,
191
+ "grad_norm": 1.6712586879730225,
192
+ "learning_rate": 1.7988e-05,
193
+ "loss": 0.3035,
194
+ "step": 7500
195
+ },
196
+ {
197
+ "epoch": 9.0,
198
+ "eval_cer": 1.0364805595068753,
199
+ "eval_loss": 0.19720108807086945,
200
+ "eval_runtime": 61.791,
201
+ "eval_samples_per_second": 110.696,
202
+ "eval_steps_per_second": 13.837,
203
+ "eval_wer": 0.9992690058479532,
204
+ "step": 7920
205
+ },
206
+ {
207
+ "epoch": 9.090960773166572,
208
+ "grad_norm": 5.482681751251221,
209
+ "learning_rate": 1.9188e-05,
210
+ "loss": 0.2585,
211
+ "step": 8000
212
+ },
213
+ {
214
+ "epoch": 9.659465605457646,
215
+ "grad_norm": 2.9059221744537354,
216
+ "learning_rate": 2.0388e-05,
217
+ "loss": 0.229,
218
+ "step": 8500
219
+ },
220
+ {
221
+ "epoch": 10.0,
222
+ "eval_cer": 1.026434329065908,
223
+ "eval_loss": 0.08719063550233841,
224
+ "eval_runtime": 64.0548,
225
+ "eval_samples_per_second": 106.784,
226
+ "eval_steps_per_second": 13.348,
227
+ "eval_wer": 0.9991228070175439,
228
+ "step": 8800
229
+ },
230
+ {
231
+ "epoch": 10.227401932916429,
232
+ "grad_norm": 4.925757884979248,
233
+ "learning_rate": 2.1588e-05,
234
+ "loss": 0.2205,
235
+ "step": 9000
236
+ },
237
+ {
238
+ "epoch": 10.795906765207505,
239
+ "grad_norm": 4.320206642150879,
240
+ "learning_rate": 2.2788000000000003e-05,
241
+ "loss": 0.1995,
242
+ "step": 9500
243
+ },
244
+ {
245
+ "epoch": 11.0,
246
+ "eval_cer": 1.0261972498814604,
247
+ "eval_loss": 0.09591592103242874,
248
+ "eval_runtime": 63.0889,
249
+ "eval_samples_per_second": 108.418,
250
+ "eval_steps_per_second": 13.552,
251
+ "eval_wer": 0.9988304093567252,
252
+ "step": 9680
253
+ },
254
+ {
255
+ "epoch": 11.363843092666288,
256
+ "grad_norm": 4.490440368652344,
257
+ "learning_rate": 2.3988e-05,
258
+ "loss": 0.1909,
259
+ "step": 10000
260
+ },
261
+ {
262
+ "epoch": 11.932347924957362,
263
+ "grad_norm": 4.226001739501953,
264
+ "learning_rate": 2.5188e-05,
265
+ "loss": 0.1824,
266
+ "step": 10500
267
+ },
268
+ {
269
+ "epoch": 12.0,
270
+ "eval_cer": 1.0316500711237553,
271
+ "eval_loss": 0.1011653020977974,
272
+ "eval_runtime": 60.7879,
273
+ "eval_samples_per_second": 112.522,
274
+ "eval_steps_per_second": 14.065,
275
+ "eval_wer": 0.9988304093567252,
276
+ "step": 10560
277
+ },
278
+ {
279
+ "epoch": 12.500284252416145,
280
+ "grad_norm": 4.8370466232299805,
281
+ "learning_rate": 2.6388000000000002e-05,
282
+ "loss": 0.1774,
283
+ "step": 11000
284
+ },
285
+ {
286
+ "epoch": 13.0,
287
+ "eval_cer": 1.0220483641536273,
288
+ "eval_loss": 0.05406388267874718,
289
+ "eval_runtime": 60.5368,
290
+ "eval_samples_per_second": 112.989,
291
+ "eval_steps_per_second": 14.124,
292
+ "eval_wer": 0.9991228070175439,
293
+ "step": 11440
294
+ },
295
+ {
296
+ "epoch": 13.068220579874929,
297
+ "grad_norm": 4.82689094543457,
298
+ "learning_rate": 2.7585600000000002e-05,
299
+ "loss": 0.1761,
300
+ "step": 11500
301
+ },
302
+ {
303
+ "epoch": 13.636725412166003,
304
+ "grad_norm": 0.785372793674469,
305
+ "learning_rate": 2.87856e-05,
306
+ "loss": 0.1739,
307
+ "step": 12000
308
+ },
309
+ {
310
+ "epoch": 14.0,
311
+ "eval_cer": 1.026997392128971,
312
+ "eval_loss": 0.0703384280204773,
313
+ "eval_runtime": 60.3222,
314
+ "eval_samples_per_second": 113.391,
315
+ "eval_steps_per_second": 14.174,
316
+ "eval_wer": 0.9989766081871345,
317
+ "step": 12320
318
+ },
319
+ {
320
+ "epoch": 14.204661739624786,
321
+ "grad_norm": 2.4106199741363525,
322
+ "learning_rate": 2.99856e-05,
323
+ "loss": 0.1642,
324
+ "step": 12500
325
+ },
326
+ {
327
+ "epoch": 14.773166571915862,
328
+ "grad_norm": 2.8759591579437256,
329
+ "learning_rate": 2.996483380918142e-05,
330
+ "loss": 0.1609,
331
+ "step": 13000
332
+ },
333
+ {
334
+ "epoch": 15.0,
335
+ "eval_cer": 1.0202702702702702,
336
+ "eval_loss": 0.048034194856882095,
337
+ "eval_runtime": 61.6445,
338
+ "eval_samples_per_second": 110.959,
339
+ "eval_steps_per_second": 13.87,
340
+ "eval_wer": 0.9988304093567252,
341
+ "step": 13200
342
+ },
343
+ {
344
+ "epoch": 15.341102899374645,
345
+ "grad_norm": 5.875328540802002,
346
+ "learning_rate": 2.9857791176729968e-05,
347
+ "loss": 0.1583,
348
+ "step": 13500
349
+ },
350
+ {
351
+ "epoch": 15.90960773166572,
352
+ "grad_norm": 3.2317707538604736,
353
+ "learning_rate": 2.9679381078280773e-05,
354
+ "loss": 0.1512,
355
+ "step": 14000
356
+ },
357
+ {
358
+ "epoch": 16.0,
359
+ "eval_cer": 1.016239924134661,
360
+ "eval_loss": 0.053960736840963364,
361
+ "eval_runtime": 61.5915,
362
+ "eval_samples_per_second": 111.054,
363
+ "eval_steps_per_second": 13.882,
364
+ "eval_wer": 0.9988304093567252,
365
+ "step": 14080
366
+ },
367
+ {
368
+ "epoch": 16.4775440591245,
369
+ "grad_norm": 3.0755059719085693,
370
+ "learning_rate": 2.9430460483519525e-05,
371
+ "loss": 0.1412,
372
+ "step": 14500
373
+ },
374
+ {
375
+ "epoch": 17.0,
376
+ "eval_cer": 1.0187885253674727,
377
+ "eval_loss": 0.03960481286048889,
378
+ "eval_runtime": 61.6801,
379
+ "eval_samples_per_second": 110.895,
380
+ "eval_steps_per_second": 13.862,
381
+ "eval_wer": 0.9988304093567252,
382
+ "step": 14960
383
+ },
384
+ {
385
+ "epoch": 17.045480386583286,
386
+ "grad_norm": 1.7888600826263428,
387
+ "learning_rate": 2.911222505012316e-05,
388
+ "loss": 0.1411,
389
+ "step": 15000
390
+ },
391
+ {
392
+ "epoch": 17.61398521887436,
393
+ "grad_norm": 5.4633941650390625,
394
+ "learning_rate": 2.872704189552075e-05,
395
+ "loss": 0.1391,
396
+ "step": 15500
397
+ },
398
+ {
399
+ "epoch": 18.0,
400
+ "eval_cer": 1.0194997629208156,
401
+ "eval_loss": 0.04934508726000786,
402
+ "eval_runtime": 60.3205,
403
+ "eval_samples_per_second": 113.394,
404
+ "eval_steps_per_second": 14.174,
405
+ "eval_wer": 0.9988304093567252,
406
+ "step": 15840
407
+ },
408
+ {
409
+ "epoch": 18.181921546333143,
410
+ "grad_norm": 4.8292951583862305,
411
+ "learning_rate": 2.8275217996094984e-05,
412
+ "loss": 0.1363,
413
+ "step": 16000
414
+ },
415
+ {
416
+ "epoch": 18.75042637862422,
417
+ "grad_norm": 1.6074540615081787,
418
+ "learning_rate": 2.775962831495378e-05,
419
+ "loss": 0.1325,
420
+ "step": 16500
421
+ },
422
+ {
423
+ "epoch": 19.0,
424
+ "eval_cer": 1.0185810810810811,
425
+ "eval_loss": 0.03655907139182091,
426
+ "eval_runtime": 60.6228,
427
+ "eval_samples_per_second": 112.829,
428
+ "eval_steps_per_second": 14.104,
429
+ "eval_wer": 0.9988304093567252,
430
+ "step": 16720
431
+ },
432
+ {
433
+ "epoch": 19.318362706083,
434
+ "grad_norm": 2.9544260501861572,
435
+ "learning_rate": 2.7182749420020325e-05,
436
+ "loss": 0.1243,
437
+ "step": 17000
438
+ },
439
+ {
440
+ "epoch": 19.886867538374077,
441
+ "grad_norm": 1.7803765535354614,
442
+ "learning_rate": 2.6547352273978724e-05,
443
+ "loss": 0.1242,
444
+ "step": 17500
445
+ },
446
+ {
447
+ "epoch": 20.0,
448
+ "eval_cer": 1.0178105737316263,
449
+ "eval_loss": 0.03915562480688095,
450
+ "eval_runtime": 63.37,
451
+ "eval_samples_per_second": 107.937,
452
+ "eval_steps_per_second": 13.492,
453
+ "eval_wer": 0.9988304093567252,
454
+ "step": 17600
455
+ },
456
+ {
457
+ "epoch": 20.454803865832858,
458
+ "grad_norm": 7.015552520751953,
459
+ "learning_rate": 2.5857923843413123e-05,
460
+ "loss": 0.122,
461
+ "step": 18000
462
+ },
463
+ {
464
+ "epoch": 21.0,
465
+ "eval_cer": 1.0192923186344238,
466
+ "eval_loss": 0.05453035235404968,
467
+ "eval_runtime": 62.1534,
468
+ "eval_samples_per_second": 110.05,
469
+ "eval_steps_per_second": 13.756,
470
+ "eval_wer": 0.9988304093567252,
471
+ "step": 18480
472
+ },
473
+ {
474
+ "epoch": 21.022740193291643,
475
+ "grad_norm": 0.5277901887893677,
476
+ "learning_rate": 2.511654911570264e-05,
477
+ "loss": 0.1154,
478
+ "step": 18500
479
+ },
480
+ {
481
+ "epoch": 21.59124502558272,
482
+ "grad_norm": 2.2439563274383545,
483
+ "learning_rate": 2.432514615070941e-05,
484
+ "loss": 0.1143,
485
+ "step": 19000
486
+ },
487
+ {
488
+ "epoch": 22.0,
489
+ "eval_cer": 1.0184625414888573,
490
+ "eval_loss": 0.04077836126089096,
491
+ "eval_runtime": 62.3438,
492
+ "eval_samples_per_second": 109.714,
493
+ "eval_steps_per_second": 13.714,
494
+ "eval_wer": 0.9988304093567252,
495
+ "step": 19360
496
+ },
497
+ {
498
+ "epoch": 22.1591813530415,
499
+ "grad_norm": 3.2656147480010986,
500
+ "learning_rate": 2.3488951059960833e-05,
501
+ "loss": 0.108,
502
+ "step": 19500
503
+ },
504
+ {
505
+ "epoch": 22.727686185332576,
506
+ "grad_norm": 1.6540584564208984,
507
+ "learning_rate": 2.261198039773451e-05,
508
+ "loss": 0.1087,
509
+ "step": 20000
510
+ },
511
+ {
512
+ "epoch": 23.0,
513
+ "eval_cer": 1.0176031294452348,
514
+ "eval_loss": 0.03096814453601837,
515
+ "eval_runtime": 61.14,
516
+ "eval_samples_per_second": 111.874,
517
+ "eval_steps_per_second": 13.984,
518
+ "eval_wer": 0.9988304093567252,
519
+ "step": 20240
520
+ },
521
+ {
522
+ "epoch": 23.295622512791358,
523
+ "grad_norm": 0.029715025797486305,
524
+ "learning_rate": 2.1698446578458188e-05,
525
+ "loss": 0.1011,
526
+ "step": 20500
527
+ },
528
+ {
529
+ "epoch": 23.864127345082434,
530
+ "grad_norm": 1.3068634271621704,
531
+ "learning_rate": 2.0752737642925386e-05,
532
+ "loss": 0.1013,
533
+ "step": 21000
534
+ },
535
+ {
536
+ "epoch": 24.0,
537
+ "eval_cer": 1.0165955429113325,
538
+ "eval_loss": 0.02615249529480934,
539
+ "eval_runtime": 60.7558,
540
+ "eval_samples_per_second": 112.582,
541
+ "eval_steps_per_second": 14.073,
542
+ "eval_wer": 0.9988304093567252,
543
+ "step": 21120
544
+ },
545
+ {
546
+ "epoch": 24.432063672541215,
547
+ "grad_norm": 2.296964168548584,
548
+ "learning_rate": 1.9779396180912585e-05,
549
+ "loss": 0.0952,
550
+ "step": 21500
551
+ },
552
+ {
553
+ "epoch": 25.0,
554
+ "grad_norm": 0.6059958934783936,
555
+ "learning_rate": 1.8783097511440484e-05,
556
+ "loss": 0.0998,
557
+ "step": 22000
558
+ },
559
+ {
560
+ "epoch": 25.0,
561
+ "eval_cer": 1.0199442863916548,
562
+ "eval_loss": 0.0387667752802372,
563
+ "eval_runtime": 62.9295,
564
+ "eval_samples_per_second": 108.693,
565
+ "eval_steps_per_second": 13.587,
566
+ "eval_wer": 0.9988304093567252,
567
+ "step": 22000
568
+ },
569
+ {
570
+ "epoch": 25.568504832291076,
571
+ "grad_norm": 3.430413007736206,
572
+ "learning_rate": 1.777067107469613e-05,
573
+ "loss": 0.0903,
574
+ "step": 22500
575
+ },
576
+ {
577
+ "epoch": 26.0,
578
+ "eval_cer": 1.0166251778093884,
579
+ "eval_loss": 0.027963772416114807,
580
+ "eval_runtime": 62.2122,
581
+ "eval_samples_per_second": 109.946,
582
+ "eval_steps_per_second": 13.743,
583
+ "eval_wer": 0.9988304093567252,
584
+ "step": 22880
585
+ },
586
+ {
587
+ "epoch": 26.136441159749857,
588
+ "grad_norm": 3.037529706954956,
589
+ "learning_rate": 1.6742923736196817e-05,
590
+ "loss": 0.0867,
591
+ "step": 23000
592
+ },
593
+ {
594
+ "epoch": 26.704945992040933,
595
+ "grad_norm": 2.661895751953125,
596
+ "learning_rate": 1.5706804490393117e-05,
597
+ "loss": 0.088,
598
+ "step": 23500
599
+ },
600
+ {
601
+ "epoch": 27.0,
602
+ "eval_cer": 1.0197072072072073,
603
+ "eval_loss": 0.04922711104154587,
604
+ "eval_runtime": 68.8335,
605
+ "eval_samples_per_second": 99.37,
606
+ "eval_steps_per_second": 12.421,
607
+ "eval_wer": 0.9988304093567252,
608
+ "step": 23760
609
+ },
610
+ {
611
+ "epoch": 27.272882319499715,
612
+ "grad_norm": 2.580177068710327,
613
+ "learning_rate": 1.4667290201218887e-05,
614
+ "loss": 0.0874,
615
+ "step": 24000
616
+ },
617
+ {
618
+ "epoch": 27.84138715179079,
619
+ "grad_norm": 6.204455375671387,
620
+ "learning_rate": 1.3629374040256936e-05,
621
+ "loss": 0.0838,
622
+ "step": 24500
623
+ },
624
+ {
625
+ "epoch": 28.0,
626
+ "eval_cer": 1.0162991939307728,
627
+ "eval_loss": 0.02296304889023304,
628
+ "eval_runtime": 65.6917,
629
+ "eval_samples_per_second": 104.123,
630
+ "eval_steps_per_second": 13.015,
631
+ "eval_wer": 0.9988304093567252,
632
+ "step": 24640
633
+ },
634
+ {
635
+ "epoch": 28.409323479249572,
636
+ "grad_norm": 0.0685189813375473,
637
+ "learning_rate": 1.2600094296980161e-05,
638
+ "loss": 0.0826,
639
+ "step": 25000
640
+ },
641
+ {
642
+ "epoch": 28.977828311540648,
643
+ "grad_norm": 6.252466678619385,
644
+ "learning_rate": 1.1580271268352735e-05,
645
+ "loss": 0.079,
646
+ "step": 25500
647
+ },
648
+ {
649
+ "epoch": 29.0,
650
+ "eval_cer": 1.0169511616880038,
651
+ "eval_loss": 0.0281895250082016,
652
+ "eval_runtime": 61.0,
653
+ "eval_samples_per_second": 112.131,
654
+ "eval_steps_per_second": 14.016,
655
+ "eval_wer": 0.9988304093567252,
656
+ "step": 25520
657
+ },
658
+ {
659
+ "epoch": 29.545764638999433,
660
+ "grad_norm": 3.142706871032715,
661
+ "learning_rate": 1.0576874461569077e-05,
662
+ "loss": 0.0747,
663
+ "step": 26000
664
+ },
665
+ {
666
+ "epoch": 30.0,
667
+ "eval_cer": 1.016239924134661,
668
+ "eval_loss": 0.027105851098895073,
669
+ "eval_runtime": 59.5418,
670
+ "eval_samples_per_second": 114.877,
671
+ "eval_steps_per_second": 14.36,
672
+ "eval_wer": 0.9988304093567252,
673
+ "step": 26400
674
+ },
675
+ {
676
+ "epoch": 30.113700966458214,
677
+ "grad_norm": 3.5694313049316406,
678
+ "learning_rate": 9.594723562586447e-06,
679
+ "loss": 0.0774,
680
+ "step": 26500
681
+ },
682
+ {
683
+ "epoch": 30.68220579874929,
684
+ "grad_norm": 5.079456329345703,
685
+ "learning_rate": 8.640419592752059e-06,
686
+ "loss": 0.0692,
687
+ "step": 27000
688
+ },
689
+ {
690
+ "epoch": 31.0,
691
+ "eval_cer": 1.0166844476055001,
692
+ "eval_loss": 0.027171434834599495,
693
+ "eval_runtime": 59.3002,
694
+ "eval_samples_per_second": 115.345,
695
+ "eval_steps_per_second": 14.418,
696
+ "eval_wer": 0.9988304093567252,
697
+ "step": 27280
698
+ },
699
+ {
700
+ "epoch": 31.250142126208072,
701
+ "grad_norm": 2.114372730255127,
702
+ "learning_rate": 7.714723096178886e-06,
703
+ "loss": 0.0718,
704
+ "step": 27500
705
+ },
706
+ {
707
+ "epoch": 31.818646958499148,
708
+ "grad_norm": 3.835890531539917,
709
+ "learning_rate": 6.824020478947078e-06,
710
+ "loss": 0.0699,
711
+ "step": 28000
712
+ },
713
+ {
714
+ "epoch": 32.0,
715
+ "eval_cer": 1.0143432906590801,
716
+ "eval_loss": 0.0426531545817852,
717
+ "eval_runtime": 58.5893,
718
+ "eval_samples_per_second": 116.745,
719
+ "eval_steps_per_second": 14.593,
720
+ "eval_wer": 0.9988304093567252,
721
+ "step": 28160
722
+ },
723
+ {
724
+ "epoch": 32.38658328595793,
725
+ "grad_norm": 3.8743183612823486,
726
+ "learning_rate": 5.97259011514287e-06,
727
+ "loss": 0.0668,
728
+ "step": 28500
729
+ },
730
+ {
731
+ "epoch": 32.955088118249,
732
+ "grad_norm": 0.82483971118927,
733
+ "learning_rate": 5.164521739694928e-06,
734
+ "loss": 0.0652,
735
+ "step": 29000
736
+ },
737
+ {
738
+ "epoch": 33.0,
739
+ "eval_cer": 1.016151019440493,
740
+ "eval_loss": 0.032351747155189514,
741
+ "eval_runtime": 60.0164,
742
+ "eval_samples_per_second": 113.969,
743
+ "eval_steps_per_second": 14.246,
744
+ "eval_wer": 0.9988304093567252,
745
+ "step": 29040
746
+ },
747
+ {
748
+ "epoch": 33.52302444570779,
749
+ "grad_norm": 0.08226080983877182,
750
+ "learning_rate": 4.403696803864931e-06,
751
+ "loss": 0.0624,
752
+ "step": 29500
753
+ },
754
+ {
755
+ "epoch": 34.0,
756
+ "eval_cer": 1.0162991939307728,
757
+ "eval_loss": 0.03149048984050751,
758
+ "eval_runtime": 58.4368,
759
+ "eval_samples_per_second": 117.05,
760
+ "eval_steps_per_second": 14.631,
761
+ "eval_wer": 0.9988304093567252,
762
+ "step": 29920
763
+ },
764
+ {
765
+ "epoch": 34.09096077316657,
766
+ "grad_norm": 0.9844266176223755,
767
+ "learning_rate": 3.6951365800521325e-06,
768
+ "loss": 0.0618,
769
+ "step": 30000
770
+ },
771
+ {
772
+ "epoch": 34.65946560545765,
773
+ "grad_norm": 2.4211599826812744,
774
+ "learning_rate": 3.039405763913186e-06,
775
+ "loss": 0.0588,
776
+ "step": 30500
777
+ },
778
+ {
779
+ "epoch": 35.0,
780
+ "eval_cer": 1.0136913229018492,
781
+ "eval_loss": 0.054938483983278275,
782
+ "eval_runtime": 59.6299,
783
+ "eval_samples_per_second": 114.708,
784
+ "eval_steps_per_second": 14.338,
785
+ "eval_wer": 0.9988304093567252,
786
+ "step": 30800
787
+ },
788
+ {
789
+ "epoch": 35.22740193291643,
790
+ "grad_norm": 3.788679599761963,
791
+ "learning_rate": 2.4411261053725335e-06,
792
+ "loss": 0.059,
793
+ "step": 31000
794
+ },
795
+ {
796
+ "epoch": 35.795906765207505,
797
+ "grad_norm": 2.701200008392334,
798
+ "learning_rate": 1.904185301084242e-06,
799
+ "loss": 0.0594,
800
+ "step": 31500
801
+ },
802
+ {
803
+ "epoch": 36.0,
804
+ "eval_cer": 1.0141654812707444,
805
+ "eval_loss": 0.045684415847063065,
806
+ "eval_runtime": 60.0287,
807
+ "eval_samples_per_second": 113.945,
808
+ "eval_steps_per_second": 14.243,
809
+ "eval_wer": 0.9988304093567252,
810
+ "step": 31680
811
+ },
812
+ {
813
+ "epoch": 36.363843092666286,
814
+ "grad_norm": 0.22563552856445312,
815
+ "learning_rate": 1.4290112725289179e-06,
816
+ "loss": 0.0557,
817
+ "step": 32000
818
+ },
819
+ {
820
+ "epoch": 36.93234792495736,
821
+ "grad_norm": 3.3882081508636475,
822
+ "learning_rate": 1.0190237218990893e-06,
823
+ "loss": 0.0619,
824
+ "step": 32500
825
+ },
826
+ {
827
+ "epoch": 37.0,
828
+ "eval_cer": 1.014402560455192,
829
+ "eval_loss": 0.046258434653282166,
830
+ "eval_runtime": 58.5233,
831
+ "eval_samples_per_second": 116.876,
832
+ "eval_steps_per_second": 14.61,
833
+ "eval_wer": 0.9988304093567252,
834
+ "step": 32560
835
+ },
836
+ {
837
+ "epoch": 37.500284252416144,
838
+ "grad_norm": 4.180235385894775,
839
+ "learning_rate": 6.761919710294118e-07,
840
+ "loss": 0.058,
841
+ "step": 33000
842
+ },
843
+ {
844
+ "epoch": 38.0,
845
+ "eval_cer": 1.0126837363679468,
846
+ "eval_loss": 0.06653982400894165,
847
+ "eval_runtime": 59.3473,
848
+ "eval_samples_per_second": 115.254,
849
+ "eval_steps_per_second": 14.407,
850
+ "eval_wer": 0.9988304093567252,
851
+ "step": 33440
852
+ },
853
+ {
854
+ "epoch": 38.06822057987493,
855
+ "grad_norm": 0.7227972745895386,
856
+ "learning_rate": 4.021627676115197e-07,
857
+ "loss": 0.0579,
858
+ "step": 33500
859
+ },
860
+ {
861
+ "epoch": 38.636725412166,
862
+ "grad_norm": 0.30214107036590576,
863
+ "learning_rate": 1.9825237525585017e-07,
864
+ "loss": 0.059,
865
+ "step": 34000
866
+ },
867
+ {
868
+ "epoch": 39.0,
869
+ "eval_cer": 1.01309862494073,
870
+ "eval_loss": 0.05947383493185043,
871
+ "eval_runtime": 69.1876,
872
+ "eval_samples_per_second": 98.862,
873
+ "eval_steps_per_second": 12.358,
874
+ "eval_wer": 0.9988304093567252,
875
+ "step": 34320
876
+ },
877
+ {
878
+ "epoch": 39.20466173962479,
879
+ "grad_norm": 4.248073577880859,
880
+ "learning_rate": 6.544025099069761e-08,
881
+ "loss": 0.0539,
882
+ "step": 34500
883
+ },
884
+ {
885
+ "epoch": 39.77316657191586,
886
+ "grad_norm": 1.873940348625183,
887
+ "learning_rate": 4.364340567880043e-09,
888
+ "loss": 0.0563,
889
+ "step": 35000
890
+ },
891
+ {
892
+ "epoch": 39.955088118249,
893
+ "eval_cer": 1.013276434329066,
894
+ "eval_loss": 0.058061111718416214,
895
+ "eval_runtime": 61.3482,
896
+ "eval_samples_per_second": 111.495,
897
+ "eval_steps_per_second": 13.937,
898
+ "eval_wer": 0.9988304093567252,
899
+ "step": 35160
900
+ },
901
+ {
902
+ "epoch": 39.955088118249,
903
+ "step": 35160,
904
+ "total_flos": 1.8269796433195942e+19,
905
+ "train_loss": 1.1798896302406563,
906
+ "train_runtime": 28544.574,
907
+ "train_samples_per_second": 78.866,
908
+ "train_steps_per_second": 1.232
909
+ }
910
+ ],
911
+ "logging_steps": 500,
912
+ "max_steps": 35160,
913
+ "num_input_tokens_seen": 0,
914
+ "num_train_epochs": 40,
915
+ "save_steps": 400,
916
+ "stateful_callbacks": {
917
+ "TrainerControl": {
918
+ "args": {
919
+ "should_epoch_stop": false,
920
+ "should_evaluate": false,
921
+ "should_log": false,
922
+ "should_save": true,
923
+ "should_training_stop": true
924
+ },
925
+ "attributes": {}
926
+ }
927
+ },
928
+ "total_flos": 1.8269796433195942e+19,
929
+ "train_batch_size": 32,
930
+ "trial_name": null,
931
+ "trial_params": null
932
+ }