File size: 23,862 Bytes
415103f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
2023-10-13 22:32:35,467 ----------------------------------------------------------------------------------------------------
2023-10-13 22:32:35,468 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-13 22:32:35,468 ----------------------------------------------------------------------------------------------------
2023-10-13 22:32:35,468 MultiCorpus: 7936 train + 992 dev + 992 test sentences
 - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr
2023-10-13 22:32:35,469 ----------------------------------------------------------------------------------------------------
2023-10-13 22:32:35,469 Train:  7936 sentences
2023-10-13 22:32:35,469         (train_with_dev=False, train_with_test=False)
2023-10-13 22:32:35,469 ----------------------------------------------------------------------------------------------------
2023-10-13 22:32:35,469 Training Params:
2023-10-13 22:32:35,469  - learning_rate: "5e-05" 
2023-10-13 22:32:35,469  - mini_batch_size: "8"
2023-10-13 22:32:35,469  - max_epochs: "10"
2023-10-13 22:32:35,469  - shuffle: "True"
2023-10-13 22:32:35,469 ----------------------------------------------------------------------------------------------------
2023-10-13 22:32:35,469 Plugins:
2023-10-13 22:32:35,469  - LinearScheduler | warmup_fraction: '0.1'
2023-10-13 22:32:35,469 ----------------------------------------------------------------------------------------------------
2023-10-13 22:32:35,469 Final evaluation on model from best epoch (best-model.pt)
2023-10-13 22:32:35,469  - metric: "('micro avg', 'f1-score')"
2023-10-13 22:32:35,469 ----------------------------------------------------------------------------------------------------
2023-10-13 22:32:35,469 Computation:
2023-10-13 22:32:35,469  - compute on device: cuda:0
2023-10-13 22:32:35,469  - embedding storage: none
2023-10-13 22:32:35,469 ----------------------------------------------------------------------------------------------------
2023-10-13 22:32:35,469 Model training base path: "hmbench-icdar/fr-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-2"
2023-10-13 22:32:35,469 ----------------------------------------------------------------------------------------------------
2023-10-13 22:32:35,469 ----------------------------------------------------------------------------------------------------
2023-10-13 22:32:42,105 epoch 1 - iter 99/992 - loss 1.97625335 - time (sec): 6.63 - samples/sec: 2619.78 - lr: 0.000005 - momentum: 0.000000
2023-10-13 22:32:48,119 epoch 1 - iter 198/992 - loss 1.20445619 - time (sec): 12.65 - samples/sec: 2622.63 - lr: 0.000010 - momentum: 0.000000
2023-10-13 22:32:53,990 epoch 1 - iter 297/992 - loss 0.90127542 - time (sec): 18.52 - samples/sec: 2645.29 - lr: 0.000015 - momentum: 0.000000
2023-10-13 22:32:59,842 epoch 1 - iter 396/992 - loss 0.73031528 - time (sec): 24.37 - samples/sec: 2664.09 - lr: 0.000020 - momentum: 0.000000
2023-10-13 22:33:05,553 epoch 1 - iter 495/992 - loss 0.61732090 - time (sec): 30.08 - samples/sec: 2695.40 - lr: 0.000025 - momentum: 0.000000
2023-10-13 22:33:11,400 epoch 1 - iter 594/992 - loss 0.53727279 - time (sec): 35.93 - samples/sec: 2719.61 - lr: 0.000030 - momentum: 0.000000
2023-10-13 22:33:17,199 epoch 1 - iter 693/992 - loss 0.48213515 - time (sec): 41.73 - samples/sec: 2724.53 - lr: 0.000035 - momentum: 0.000000
2023-10-13 22:33:23,140 epoch 1 - iter 792/992 - loss 0.43768861 - time (sec): 47.67 - samples/sec: 2731.72 - lr: 0.000040 - momentum: 0.000000
2023-10-13 22:33:29,370 epoch 1 - iter 891/992 - loss 0.40269124 - time (sec): 53.90 - samples/sec: 2727.47 - lr: 0.000045 - momentum: 0.000000
2023-10-13 22:33:35,471 epoch 1 - iter 990/992 - loss 0.37457303 - time (sec): 60.00 - samples/sec: 2724.88 - lr: 0.000050 - momentum: 0.000000
2023-10-13 22:33:35,623 ----------------------------------------------------------------------------------------------------
2023-10-13 22:33:35,624 EPOCH 1 done: loss 0.3735 - lr: 0.000050
2023-10-13 22:33:38,899 DEV : loss 0.10837739706039429 - f1-score (micro avg)  0.6822
2023-10-13 22:33:38,924 saving best model
2023-10-13 22:33:39,350 ----------------------------------------------------------------------------------------------------
2023-10-13 22:33:45,003 epoch 2 - iter 99/992 - loss 0.10295012 - time (sec): 5.65 - samples/sec: 2750.21 - lr: 0.000049 - momentum: 0.000000
2023-10-13 22:33:50,901 epoch 2 - iter 198/992 - loss 0.10973798 - time (sec): 11.55 - samples/sec: 2718.58 - lr: 0.000049 - momentum: 0.000000
2023-10-13 22:33:56,904 epoch 2 - iter 297/992 - loss 0.10543489 - time (sec): 17.55 - samples/sec: 2761.93 - lr: 0.000048 - momentum: 0.000000
2023-10-13 22:34:02,862 epoch 2 - iter 396/992 - loss 0.10381763 - time (sec): 23.51 - samples/sec: 2769.50 - lr: 0.000048 - momentum: 0.000000
2023-10-13 22:34:08,919 epoch 2 - iter 495/992 - loss 0.10311459 - time (sec): 29.57 - samples/sec: 2751.80 - lr: 0.000047 - momentum: 0.000000
2023-10-13 22:34:14,729 epoch 2 - iter 594/992 - loss 0.10171878 - time (sec): 35.38 - samples/sec: 2751.64 - lr: 0.000047 - momentum: 0.000000
2023-10-13 22:34:20,492 epoch 2 - iter 693/992 - loss 0.09979812 - time (sec): 41.14 - samples/sec: 2754.11 - lr: 0.000046 - momentum: 0.000000
2023-10-13 22:34:26,580 epoch 2 - iter 792/992 - loss 0.10186744 - time (sec): 47.23 - samples/sec: 2753.33 - lr: 0.000046 - momentum: 0.000000
2023-10-13 22:34:32,670 epoch 2 - iter 891/992 - loss 0.10186196 - time (sec): 53.32 - samples/sec: 2741.16 - lr: 0.000045 - momentum: 0.000000
2023-10-13 22:34:38,791 epoch 2 - iter 990/992 - loss 0.10164595 - time (sec): 59.44 - samples/sec: 2752.43 - lr: 0.000044 - momentum: 0.000000
2023-10-13 22:34:38,920 ----------------------------------------------------------------------------------------------------
2023-10-13 22:34:38,920 EPOCH 2 done: loss 0.1015 - lr: 0.000044
2023-10-13 22:34:43,263 DEV : loss 0.08593912422657013 - f1-score (micro avg)  0.7422
2023-10-13 22:34:43,287 saving best model
2023-10-13 22:34:43,770 ----------------------------------------------------------------------------------------------------
2023-10-13 22:34:49,664 epoch 3 - iter 99/992 - loss 0.08176039 - time (sec): 5.89 - samples/sec: 2717.35 - lr: 0.000044 - momentum: 0.000000
2023-10-13 22:34:55,620 epoch 3 - iter 198/992 - loss 0.07620139 - time (sec): 11.85 - samples/sec: 2720.74 - lr: 0.000043 - momentum: 0.000000
2023-10-13 22:35:01,359 epoch 3 - iter 297/992 - loss 0.07242783 - time (sec): 17.59 - samples/sec: 2743.33 - lr: 0.000043 - momentum: 0.000000
2023-10-13 22:35:07,179 epoch 3 - iter 396/992 - loss 0.07362963 - time (sec): 23.41 - samples/sec: 2759.74 - lr: 0.000042 - momentum: 0.000000
2023-10-13 22:35:13,060 epoch 3 - iter 495/992 - loss 0.07299823 - time (sec): 29.29 - samples/sec: 2761.91 - lr: 0.000042 - momentum: 0.000000
2023-10-13 22:35:19,048 epoch 3 - iter 594/992 - loss 0.07330654 - time (sec): 35.27 - samples/sec: 2761.44 - lr: 0.000041 - momentum: 0.000000
2023-10-13 22:35:25,090 epoch 3 - iter 693/992 - loss 0.07489880 - time (sec): 41.32 - samples/sec: 2747.94 - lr: 0.000041 - momentum: 0.000000
2023-10-13 22:35:31,036 epoch 3 - iter 792/992 - loss 0.07562313 - time (sec): 47.26 - samples/sec: 2760.10 - lr: 0.000040 - momentum: 0.000000
2023-10-13 22:35:37,046 epoch 3 - iter 891/992 - loss 0.07372307 - time (sec): 53.27 - samples/sec: 2766.49 - lr: 0.000039 - momentum: 0.000000
2023-10-13 22:35:42,895 epoch 3 - iter 990/992 - loss 0.07373060 - time (sec): 59.12 - samples/sec: 2769.61 - lr: 0.000039 - momentum: 0.000000
2023-10-13 22:35:42,996 ----------------------------------------------------------------------------------------------------
2023-10-13 22:35:42,996 EPOCH 3 done: loss 0.0737 - lr: 0.000039
2023-10-13 22:35:46,388 DEV : loss 0.10231217741966248 - f1-score (micro avg)  0.7589
2023-10-13 22:35:46,412 saving best model
2023-10-13 22:35:46,925 ----------------------------------------------------------------------------------------------------
2023-10-13 22:35:52,809 epoch 4 - iter 99/992 - loss 0.05134981 - time (sec): 5.88 - samples/sec: 2723.80 - lr: 0.000038 - momentum: 0.000000
2023-10-13 22:35:59,141 epoch 4 - iter 198/992 - loss 0.05017639 - time (sec): 12.21 - samples/sec: 2716.64 - lr: 0.000038 - momentum: 0.000000
2023-10-13 22:36:05,090 epoch 4 - iter 297/992 - loss 0.04727321 - time (sec): 18.16 - samples/sec: 2718.31 - lr: 0.000037 - momentum: 0.000000
2023-10-13 22:36:10,869 epoch 4 - iter 396/992 - loss 0.04808307 - time (sec): 23.94 - samples/sec: 2749.13 - lr: 0.000037 - momentum: 0.000000
2023-10-13 22:36:16,706 epoch 4 - iter 495/992 - loss 0.04713806 - time (sec): 29.78 - samples/sec: 2768.47 - lr: 0.000036 - momentum: 0.000000
2023-10-13 22:36:22,555 epoch 4 - iter 594/992 - loss 0.04822839 - time (sec): 35.63 - samples/sec: 2769.58 - lr: 0.000036 - momentum: 0.000000
2023-10-13 22:36:28,395 epoch 4 - iter 693/992 - loss 0.04982517 - time (sec): 41.47 - samples/sec: 2764.17 - lr: 0.000035 - momentum: 0.000000
2023-10-13 22:36:34,340 epoch 4 - iter 792/992 - loss 0.05129636 - time (sec): 47.41 - samples/sec: 2765.07 - lr: 0.000034 - momentum: 0.000000
2023-10-13 22:36:39,871 epoch 4 - iter 891/992 - loss 0.05132015 - time (sec): 52.94 - samples/sec: 2782.54 - lr: 0.000034 - momentum: 0.000000
2023-10-13 22:36:46,060 epoch 4 - iter 990/992 - loss 0.05099074 - time (sec): 59.13 - samples/sec: 2765.99 - lr: 0.000033 - momentum: 0.000000
2023-10-13 22:36:46,190 ----------------------------------------------------------------------------------------------------
2023-10-13 22:36:46,190 EPOCH 4 done: loss 0.0510 - lr: 0.000033
2023-10-13 22:36:49,798 DEV : loss 0.13146057724952698 - f1-score (micro avg)  0.7587
2023-10-13 22:36:49,822 ----------------------------------------------------------------------------------------------------
2023-10-13 22:36:55,595 epoch 5 - iter 99/992 - loss 0.03914982 - time (sec): 5.77 - samples/sec: 2761.13 - lr: 0.000033 - momentum: 0.000000
2023-10-13 22:37:01,451 epoch 5 - iter 198/992 - loss 0.03993438 - time (sec): 11.63 - samples/sec: 2786.06 - lr: 0.000032 - momentum: 0.000000
2023-10-13 22:37:07,215 epoch 5 - iter 297/992 - loss 0.03848388 - time (sec): 17.39 - samples/sec: 2811.92 - lr: 0.000032 - momentum: 0.000000
2023-10-13 22:37:13,113 epoch 5 - iter 396/992 - loss 0.03928207 - time (sec): 23.29 - samples/sec: 2821.97 - lr: 0.000031 - momentum: 0.000000
2023-10-13 22:37:19,287 epoch 5 - iter 495/992 - loss 0.04014864 - time (sec): 29.46 - samples/sec: 2803.85 - lr: 0.000031 - momentum: 0.000000
2023-10-13 22:37:25,177 epoch 5 - iter 594/992 - loss 0.04251426 - time (sec): 35.35 - samples/sec: 2787.71 - lr: 0.000030 - momentum: 0.000000
2023-10-13 22:37:30,988 epoch 5 - iter 693/992 - loss 0.04162004 - time (sec): 41.17 - samples/sec: 2781.73 - lr: 0.000029 - momentum: 0.000000
2023-10-13 22:37:36,743 epoch 5 - iter 792/992 - loss 0.04073969 - time (sec): 46.92 - samples/sec: 2780.60 - lr: 0.000029 - momentum: 0.000000
2023-10-13 22:37:43,244 epoch 5 - iter 891/992 - loss 0.04252912 - time (sec): 53.42 - samples/sec: 2758.70 - lr: 0.000028 - momentum: 0.000000
2023-10-13 22:37:48,877 epoch 5 - iter 990/992 - loss 0.04211590 - time (sec): 59.05 - samples/sec: 2768.82 - lr: 0.000028 - momentum: 0.000000
2023-10-13 22:37:49,038 ----------------------------------------------------------------------------------------------------
2023-10-13 22:37:49,038 EPOCH 5 done: loss 0.0421 - lr: 0.000028
2023-10-13 22:37:52,595 DEV : loss 0.15390822291374207 - f1-score (micro avg)  0.7563
2023-10-13 22:37:52,618 ----------------------------------------------------------------------------------------------------
2023-10-13 22:37:58,397 epoch 6 - iter 99/992 - loss 0.02447859 - time (sec): 5.78 - samples/sec: 2857.09 - lr: 0.000027 - momentum: 0.000000
2023-10-13 22:38:04,483 epoch 6 - iter 198/992 - loss 0.02281979 - time (sec): 11.86 - samples/sec: 2814.99 - lr: 0.000027 - momentum: 0.000000
2023-10-13 22:38:10,375 epoch 6 - iter 297/992 - loss 0.02449871 - time (sec): 17.76 - samples/sec: 2836.72 - lr: 0.000026 - momentum: 0.000000
2023-10-13 22:38:16,330 epoch 6 - iter 396/992 - loss 0.02562274 - time (sec): 23.71 - samples/sec: 2816.53 - lr: 0.000026 - momentum: 0.000000
2023-10-13 22:38:22,242 epoch 6 - iter 495/992 - loss 0.02613520 - time (sec): 29.62 - samples/sec: 2780.83 - lr: 0.000025 - momentum: 0.000000
2023-10-13 22:38:28,192 epoch 6 - iter 594/992 - loss 0.02619509 - time (sec): 35.57 - samples/sec: 2768.90 - lr: 0.000024 - momentum: 0.000000
2023-10-13 22:38:33,861 epoch 6 - iter 693/992 - loss 0.02665766 - time (sec): 41.24 - samples/sec: 2754.02 - lr: 0.000024 - momentum: 0.000000
2023-10-13 22:38:39,952 epoch 6 - iter 792/992 - loss 0.02787193 - time (sec): 47.33 - samples/sec: 2770.66 - lr: 0.000023 - momentum: 0.000000
2023-10-13 22:38:45,783 epoch 6 - iter 891/992 - loss 0.02982103 - time (sec): 53.16 - samples/sec: 2766.54 - lr: 0.000023 - momentum: 0.000000
2023-10-13 22:38:51,599 epoch 6 - iter 990/992 - loss 0.02967950 - time (sec): 58.98 - samples/sec: 2772.64 - lr: 0.000022 - momentum: 0.000000
2023-10-13 22:38:51,724 ----------------------------------------------------------------------------------------------------
2023-10-13 22:38:51,724 EPOCH 6 done: loss 0.0296 - lr: 0.000022
2023-10-13 22:38:55,149 DEV : loss 0.19599080085754395 - f1-score (micro avg)  0.7572
2023-10-13 22:38:55,170 ----------------------------------------------------------------------------------------------------
2023-10-13 22:39:00,912 epoch 7 - iter 99/992 - loss 0.02376152 - time (sec): 5.74 - samples/sec: 2826.61 - lr: 0.000022 - momentum: 0.000000
2023-10-13 22:39:06,655 epoch 7 - iter 198/992 - loss 0.01958953 - time (sec): 11.48 - samples/sec: 2810.22 - lr: 0.000021 - momentum: 0.000000
2023-10-13 22:39:12,687 epoch 7 - iter 297/992 - loss 0.01962767 - time (sec): 17.52 - samples/sec: 2831.96 - lr: 0.000021 - momentum: 0.000000
2023-10-13 22:39:18,688 epoch 7 - iter 396/992 - loss 0.01948507 - time (sec): 23.52 - samples/sec: 2822.98 - lr: 0.000020 - momentum: 0.000000
2023-10-13 22:39:24,654 epoch 7 - iter 495/992 - loss 0.01985791 - time (sec): 29.48 - samples/sec: 2810.77 - lr: 0.000019 - momentum: 0.000000
2023-10-13 22:39:30,455 epoch 7 - iter 594/992 - loss 0.02068738 - time (sec): 35.28 - samples/sec: 2793.89 - lr: 0.000019 - momentum: 0.000000
2023-10-13 22:39:36,232 epoch 7 - iter 693/992 - loss 0.02241871 - time (sec): 41.06 - samples/sec: 2796.58 - lr: 0.000018 - momentum: 0.000000
2023-10-13 22:39:41,978 epoch 7 - iter 792/992 - loss 0.02181208 - time (sec): 46.81 - samples/sec: 2804.38 - lr: 0.000018 - momentum: 0.000000
2023-10-13 22:39:47,996 epoch 7 - iter 891/992 - loss 0.02217841 - time (sec): 52.83 - samples/sec: 2783.90 - lr: 0.000017 - momentum: 0.000000
2023-10-13 22:39:54,016 epoch 7 - iter 990/992 - loss 0.02172978 - time (sec): 58.84 - samples/sec: 2783.19 - lr: 0.000017 - momentum: 0.000000
2023-10-13 22:39:54,115 ----------------------------------------------------------------------------------------------------
2023-10-13 22:39:54,115 EPOCH 7 done: loss 0.0217 - lr: 0.000017
2023-10-13 22:39:57,955 DEV : loss 0.18885478377342224 - f1-score (micro avg)  0.765
2023-10-13 22:39:57,976 saving best model
2023-10-13 22:39:58,428 ----------------------------------------------------------------------------------------------------
2023-10-13 22:40:04,318 epoch 8 - iter 99/992 - loss 0.02286923 - time (sec): 5.88 - samples/sec: 2805.77 - lr: 0.000016 - momentum: 0.000000
2023-10-13 22:40:10,386 epoch 8 - iter 198/992 - loss 0.01729121 - time (sec): 11.95 - samples/sec: 2800.86 - lr: 0.000016 - momentum: 0.000000
2023-10-13 22:40:16,196 epoch 8 - iter 297/992 - loss 0.01585346 - time (sec): 17.76 - samples/sec: 2818.05 - lr: 0.000015 - momentum: 0.000000
2023-10-13 22:40:22,131 epoch 8 - iter 396/992 - loss 0.01479878 - time (sec): 23.70 - samples/sec: 2838.92 - lr: 0.000014 - momentum: 0.000000
2023-10-13 22:40:27,927 epoch 8 - iter 495/992 - loss 0.01559774 - time (sec): 29.49 - samples/sec: 2832.36 - lr: 0.000014 - momentum: 0.000000
2023-10-13 22:40:33,638 epoch 8 - iter 594/992 - loss 0.01479709 - time (sec): 35.20 - samples/sec: 2821.11 - lr: 0.000013 - momentum: 0.000000
2023-10-13 22:40:39,365 epoch 8 - iter 693/992 - loss 0.01447454 - time (sec): 40.93 - samples/sec: 2808.13 - lr: 0.000013 - momentum: 0.000000
2023-10-13 22:40:45,325 epoch 8 - iter 792/992 - loss 0.01516714 - time (sec): 46.89 - samples/sec: 2803.50 - lr: 0.000012 - momentum: 0.000000
2023-10-13 22:40:51,152 epoch 8 - iter 891/992 - loss 0.01551720 - time (sec): 52.72 - samples/sec: 2801.05 - lr: 0.000012 - momentum: 0.000000
2023-10-13 22:40:56,972 epoch 8 - iter 990/992 - loss 0.01557435 - time (sec): 58.54 - samples/sec: 2798.83 - lr: 0.000011 - momentum: 0.000000
2023-10-13 22:40:57,071 ----------------------------------------------------------------------------------------------------
2023-10-13 22:40:57,072 EPOCH 8 done: loss 0.0159 - lr: 0.000011
2023-10-13 22:41:00,599 DEV : loss 0.19485069811344147 - f1-score (micro avg)  0.7664
2023-10-13 22:41:00,622 saving best model
2023-10-13 22:41:01,122 ----------------------------------------------------------------------------------------------------
2023-10-13 22:41:06,817 epoch 9 - iter 99/992 - loss 0.00725184 - time (sec): 5.69 - samples/sec: 2598.12 - lr: 0.000011 - momentum: 0.000000
2023-10-13 22:41:12,615 epoch 9 - iter 198/992 - loss 0.01324789 - time (sec): 11.49 - samples/sec: 2671.94 - lr: 0.000010 - momentum: 0.000000
2023-10-13 22:41:18,618 epoch 9 - iter 297/992 - loss 0.01262387 - time (sec): 17.49 - samples/sec: 2676.85 - lr: 0.000009 - momentum: 0.000000
2023-10-13 22:41:24,527 epoch 9 - iter 396/992 - loss 0.01078576 - time (sec): 23.40 - samples/sec: 2700.15 - lr: 0.000009 - momentum: 0.000000
2023-10-13 22:41:30,687 epoch 9 - iter 495/992 - loss 0.01086785 - time (sec): 29.56 - samples/sec: 2708.13 - lr: 0.000008 - momentum: 0.000000
2023-10-13 22:41:36,631 epoch 9 - iter 594/992 - loss 0.01119434 - time (sec): 35.50 - samples/sec: 2722.13 - lr: 0.000008 - momentum: 0.000000
2023-10-13 22:41:42,383 epoch 9 - iter 693/992 - loss 0.01095970 - time (sec): 41.26 - samples/sec: 2736.96 - lr: 0.000007 - momentum: 0.000000
2023-10-13 22:41:48,299 epoch 9 - iter 792/992 - loss 0.01070482 - time (sec): 47.17 - samples/sec: 2753.00 - lr: 0.000007 - momentum: 0.000000
2023-10-13 22:41:54,240 epoch 9 - iter 891/992 - loss 0.01101838 - time (sec): 53.11 - samples/sec: 2759.58 - lr: 0.000006 - momentum: 0.000000
2023-10-13 22:42:00,495 epoch 9 - iter 990/992 - loss 0.01099219 - time (sec): 59.37 - samples/sec: 2755.51 - lr: 0.000006 - momentum: 0.000000
2023-10-13 22:42:00,600 ----------------------------------------------------------------------------------------------------
2023-10-13 22:42:00,600 EPOCH 9 done: loss 0.0110 - lr: 0.000006
2023-10-13 22:42:04,597 DEV : loss 0.20399770140647888 - f1-score (micro avg)  0.767
2023-10-13 22:42:04,619 saving best model
2023-10-13 22:42:05,118 ----------------------------------------------------------------------------------------------------
2023-10-13 22:42:11,036 epoch 10 - iter 99/992 - loss 0.00694056 - time (sec): 5.92 - samples/sec: 2777.84 - lr: 0.000005 - momentum: 0.000000
2023-10-13 22:42:17,345 epoch 10 - iter 198/992 - loss 0.00836868 - time (sec): 12.23 - samples/sec: 2802.73 - lr: 0.000004 - momentum: 0.000000
2023-10-13 22:42:22,986 epoch 10 - iter 297/992 - loss 0.00886989 - time (sec): 17.87 - samples/sec: 2785.15 - lr: 0.000004 - momentum: 0.000000
2023-10-13 22:42:28,741 epoch 10 - iter 396/992 - loss 0.00862895 - time (sec): 23.62 - samples/sec: 2804.19 - lr: 0.000003 - momentum: 0.000000
2023-10-13 22:42:35,030 epoch 10 - iter 495/992 - loss 0.00902308 - time (sec): 29.91 - samples/sec: 2785.89 - lr: 0.000003 - momentum: 0.000000
2023-10-13 22:42:40,907 epoch 10 - iter 594/992 - loss 0.00870118 - time (sec): 35.79 - samples/sec: 2776.25 - lr: 0.000002 - momentum: 0.000000
2023-10-13 22:42:46,703 epoch 10 - iter 693/992 - loss 0.00845402 - time (sec): 41.58 - samples/sec: 2756.78 - lr: 0.000002 - momentum: 0.000000
2023-10-13 22:42:52,747 epoch 10 - iter 792/992 - loss 0.00817124 - time (sec): 47.63 - samples/sec: 2751.49 - lr: 0.000001 - momentum: 0.000000
2023-10-13 22:42:58,806 epoch 10 - iter 891/992 - loss 0.00784789 - time (sec): 53.69 - samples/sec: 2751.57 - lr: 0.000001 - momentum: 0.000000
2023-10-13 22:43:04,469 epoch 10 - iter 990/992 - loss 0.00753226 - time (sec): 59.35 - samples/sec: 2757.97 - lr: 0.000000 - momentum: 0.000000
2023-10-13 22:43:04,580 ----------------------------------------------------------------------------------------------------
2023-10-13 22:43:04,580 EPOCH 10 done: loss 0.0075 - lr: 0.000000
2023-10-13 22:43:07,991 DEV : loss 0.20772945880889893 - f1-score (micro avg)  0.7698
2023-10-13 22:43:08,013 saving best model
2023-10-13 22:43:08,849 ----------------------------------------------------------------------------------------------------
2023-10-13 22:43:08,850 Loading model from best epoch ...
2023-10-13 22:43:10,240 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-13 22:43:13,645 
Results:
- F-score (micro) 0.7805
- F-score (macro) 0.6917
- Accuracy 0.6599

By class:
              precision    recall  f1-score   support

         LOC     0.8191    0.8641    0.8410       655
         PER     0.7171    0.8296    0.7692       223
         ORG     0.4912    0.4409    0.4647       127

   micro avg     0.7592    0.8030    0.7805      1005
   macro avg     0.6758    0.7116    0.6917      1005
weighted avg     0.7550    0.8030    0.7775      1005

2023-10-13 22:43:13,646 ----------------------------------------------------------------------------------------------------