File size: 23,939 Bytes
d6a806b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
2023-10-13 20:53:18,545 ----------------------------------------------------------------------------------------------------
2023-10-13 20:53:18,546 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): BertModel(
      (embeddings): BertEmbeddings(
        (word_embeddings): Embedding(32001, 768)
        (position_embeddings): Embedding(512, 768)
        (token_type_embeddings): Embedding(2, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): BertEncoder(
        (layer): ModuleList(
          (0-11): 12 x BertLayer(
            (attention): BertAttention(
              (self): BertSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): BertSelfOutput(
                (dense): Linear(in_features=768, out_features=768, bias=True)
                (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
            (intermediate): BertIntermediate(
              (dense): Linear(in_features=768, out_features=3072, bias=True)
              (intermediate_act_fn): GELUActivation()
            )
            (output): BertOutput(
              (dense): Linear(in_features=3072, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
        )
      )
      (pooler): BertPooler(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (activation): Tanh()
      )
    )
  )
  (locked_dropout): LockedDropout(p=0.5)
  (linear): Linear(in_features=768, out_features=13, bias=True)
  (loss_function): CrossEntropyLoss()
)"
2023-10-13 20:53:18,546 ----------------------------------------------------------------------------------------------------
2023-10-13 20:53:18,547 MultiCorpus: 7936 train + 992 dev + 992 test sentences
 - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr
2023-10-13 20:53:18,547 ----------------------------------------------------------------------------------------------------
2023-10-13 20:53:18,547 Train:  7936 sentences
2023-10-13 20:53:18,547         (train_with_dev=False, train_with_test=False)
2023-10-13 20:53:18,547 ----------------------------------------------------------------------------------------------------
2023-10-13 20:53:18,547 Training Params:
2023-10-13 20:53:18,547  - learning_rate: "3e-05" 
2023-10-13 20:53:18,547  - mini_batch_size: "4"
2023-10-13 20:53:18,547  - max_epochs: "10"
2023-10-13 20:53:18,547  - shuffle: "True"
2023-10-13 20:53:18,547 ----------------------------------------------------------------------------------------------------
2023-10-13 20:53:18,547 Plugins:
2023-10-13 20:53:18,547  - LinearScheduler | warmup_fraction: '0.1'
2023-10-13 20:53:18,547 ----------------------------------------------------------------------------------------------------
2023-10-13 20:53:18,547 Final evaluation on model from best epoch (best-model.pt)
2023-10-13 20:53:18,547  - metric: "('micro avg', 'f1-score')"
2023-10-13 20:53:18,547 ----------------------------------------------------------------------------------------------------
2023-10-13 20:53:18,547 Computation:
2023-10-13 20:53:18,547  - compute on device: cuda:0
2023-10-13 20:53:18,547  - embedding storage: none
2023-10-13 20:53:18,547 ----------------------------------------------------------------------------------------------------
2023-10-13 20:53:18,547 Model training base path: "hmbench-icdar/fr-dbmdz/bert-base-historic-multilingual-cased-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-1"
2023-10-13 20:53:18,547 ----------------------------------------------------------------------------------------------------
2023-10-13 20:53:18,547 ----------------------------------------------------------------------------------------------------
2023-10-13 20:53:28,676 epoch 1 - iter 198/1984 - loss 1.88770718 - time (sec): 10.13 - samples/sec: 1587.68 - lr: 0.000003 - momentum: 0.000000
2023-10-13 20:53:37,508 epoch 1 - iter 396/1984 - loss 1.12484439 - time (sec): 18.96 - samples/sec: 1717.72 - lr: 0.000006 - momentum: 0.000000
2023-10-13 20:53:46,295 epoch 1 - iter 594/1984 - loss 0.83522090 - time (sec): 27.75 - samples/sec: 1772.35 - lr: 0.000009 - momentum: 0.000000
2023-10-13 20:53:54,959 epoch 1 - iter 792/1984 - loss 0.67679270 - time (sec): 36.41 - samples/sec: 1793.37 - lr: 0.000012 - momentum: 0.000000
2023-10-13 20:54:03,579 epoch 1 - iter 990/1984 - loss 0.57703210 - time (sec): 45.03 - samples/sec: 1807.87 - lr: 0.000015 - momentum: 0.000000
2023-10-13 20:54:12,160 epoch 1 - iter 1188/1984 - loss 0.50510842 - time (sec): 53.61 - samples/sec: 1819.73 - lr: 0.000018 - momentum: 0.000000
2023-10-13 20:54:20,861 epoch 1 - iter 1386/1984 - loss 0.45213988 - time (sec): 62.31 - samples/sec: 1840.43 - lr: 0.000021 - momentum: 0.000000
2023-10-13 20:54:30,522 epoch 1 - iter 1584/1984 - loss 0.41118920 - time (sec): 71.97 - samples/sec: 1836.31 - lr: 0.000024 - momentum: 0.000000
2023-10-13 20:54:41,080 epoch 1 - iter 1782/1984 - loss 0.38328017 - time (sec): 82.53 - samples/sec: 1794.37 - lr: 0.000027 - momentum: 0.000000
2023-10-13 20:54:50,731 epoch 1 - iter 1980/1984 - loss 0.35941678 - time (sec): 92.18 - samples/sec: 1776.70 - lr: 0.000030 - momentum: 0.000000
2023-10-13 20:54:50,895 ----------------------------------------------------------------------------------------------------
2023-10-13 20:54:50,895 EPOCH 1 done: loss 0.3590 - lr: 0.000030
2023-10-13 20:54:53,891 DEV : loss 0.10573934763669968 - f1-score (micro avg)  0.6591
2023-10-13 20:54:53,910 saving best model
2023-10-13 20:54:54,647 ----------------------------------------------------------------------------------------------------
2023-10-13 20:55:03,748 epoch 2 - iter 198/1984 - loss 0.12106577 - time (sec): 9.10 - samples/sec: 1896.84 - lr: 0.000030 - momentum: 0.000000
2023-10-13 20:55:12,515 epoch 2 - iter 396/1984 - loss 0.11964675 - time (sec): 17.87 - samples/sec: 1814.78 - lr: 0.000029 - momentum: 0.000000
2023-10-13 20:55:21,655 epoch 2 - iter 594/1984 - loss 0.11534460 - time (sec): 27.01 - samples/sec: 1845.31 - lr: 0.000029 - momentum: 0.000000
2023-10-13 20:55:30,327 epoch 2 - iter 792/1984 - loss 0.11408993 - time (sec): 35.68 - samples/sec: 1814.80 - lr: 0.000029 - momentum: 0.000000
2023-10-13 20:55:39,134 epoch 2 - iter 990/1984 - loss 0.11433753 - time (sec): 44.49 - samples/sec: 1846.50 - lr: 0.000028 - momentum: 0.000000
2023-10-13 20:55:47,726 epoch 2 - iter 1188/1984 - loss 0.11332009 - time (sec): 53.08 - samples/sec: 1848.33 - lr: 0.000028 - momentum: 0.000000
2023-10-13 20:55:57,506 epoch 2 - iter 1386/1984 - loss 0.11366621 - time (sec): 62.86 - samples/sec: 1823.09 - lr: 0.000028 - momentum: 0.000000
2023-10-13 20:56:08,043 epoch 2 - iter 1584/1984 - loss 0.11325580 - time (sec): 73.39 - samples/sec: 1778.70 - lr: 0.000027 - momentum: 0.000000
2023-10-13 20:56:18,565 epoch 2 - iter 1782/1984 - loss 0.11283397 - time (sec): 83.92 - samples/sec: 1755.40 - lr: 0.000027 - momentum: 0.000000
2023-10-13 20:56:28,780 epoch 2 - iter 1980/1984 - loss 0.11160017 - time (sec): 94.13 - samples/sec: 1739.67 - lr: 0.000027 - momentum: 0.000000
2023-10-13 20:56:28,992 ----------------------------------------------------------------------------------------------------
2023-10-13 20:56:28,992 EPOCH 2 done: loss 0.1115 - lr: 0.000027
2023-10-13 20:56:32,481 DEV : loss 0.1008807048201561 - f1-score (micro avg)  0.7066
2023-10-13 20:56:32,506 saving best model
2023-10-13 20:56:33,183 ----------------------------------------------------------------------------------------------------
2023-10-13 20:56:43,466 epoch 3 - iter 198/1984 - loss 0.07917549 - time (sec): 10.28 - samples/sec: 1593.09 - lr: 0.000026 - momentum: 0.000000
2023-10-13 20:56:52,745 epoch 3 - iter 396/1984 - loss 0.07784288 - time (sec): 19.56 - samples/sec: 1629.87 - lr: 0.000026 - momentum: 0.000000
2023-10-13 20:57:02,097 epoch 3 - iter 594/1984 - loss 0.07362250 - time (sec): 28.91 - samples/sec: 1709.82 - lr: 0.000026 - momentum: 0.000000
2023-10-13 20:57:11,220 epoch 3 - iter 792/1984 - loss 0.07809181 - time (sec): 38.04 - samples/sec: 1752.17 - lr: 0.000025 - momentum: 0.000000
2023-10-13 20:57:20,434 epoch 3 - iter 990/1984 - loss 0.08051703 - time (sec): 47.25 - samples/sec: 1753.43 - lr: 0.000025 - momentum: 0.000000
2023-10-13 20:57:29,676 epoch 3 - iter 1188/1984 - loss 0.08164550 - time (sec): 56.49 - samples/sec: 1752.99 - lr: 0.000025 - momentum: 0.000000
2023-10-13 20:57:38,750 epoch 3 - iter 1386/1984 - loss 0.08050737 - time (sec): 65.57 - samples/sec: 1753.84 - lr: 0.000024 - momentum: 0.000000
2023-10-13 20:57:47,724 epoch 3 - iter 1584/1984 - loss 0.08301541 - time (sec): 74.54 - samples/sec: 1764.39 - lr: 0.000024 - momentum: 0.000000
2023-10-13 20:57:56,733 epoch 3 - iter 1782/1984 - loss 0.08278631 - time (sec): 83.55 - samples/sec: 1768.46 - lr: 0.000024 - momentum: 0.000000
2023-10-13 20:58:05,774 epoch 3 - iter 1980/1984 - loss 0.08178188 - time (sec): 92.59 - samples/sec: 1766.44 - lr: 0.000023 - momentum: 0.000000
2023-10-13 20:58:05,956 ----------------------------------------------------------------------------------------------------
2023-10-13 20:58:05,956 EPOCH 3 done: loss 0.0817 - lr: 0.000023
2023-10-13 20:58:09,397 DEV : loss 0.14147229492664337 - f1-score (micro avg)  0.7464
2023-10-13 20:58:09,417 saving best model
2023-10-13 20:58:09,979 ----------------------------------------------------------------------------------------------------
2023-10-13 20:58:18,977 epoch 4 - iter 198/1984 - loss 0.05757599 - time (sec): 8.99 - samples/sec: 1838.23 - lr: 0.000023 - momentum: 0.000000
2023-10-13 20:58:28,176 epoch 4 - iter 396/1984 - loss 0.06403415 - time (sec): 18.19 - samples/sec: 1827.43 - lr: 0.000023 - momentum: 0.000000
2023-10-13 20:58:37,156 epoch 4 - iter 594/1984 - loss 0.06415865 - time (sec): 27.17 - samples/sec: 1805.48 - lr: 0.000022 - momentum: 0.000000
2023-10-13 20:58:46,425 epoch 4 - iter 792/1984 - loss 0.06303083 - time (sec): 36.44 - samples/sec: 1785.64 - lr: 0.000022 - momentum: 0.000000
2023-10-13 20:58:55,437 epoch 4 - iter 990/1984 - loss 0.06259595 - time (sec): 45.45 - samples/sec: 1791.39 - lr: 0.000022 - momentum: 0.000000
2023-10-13 20:59:04,506 epoch 4 - iter 1188/1984 - loss 0.06147075 - time (sec): 54.52 - samples/sec: 1791.86 - lr: 0.000021 - momentum: 0.000000
2023-10-13 20:59:13,860 epoch 4 - iter 1386/1984 - loss 0.06547581 - time (sec): 63.88 - samples/sec: 1784.41 - lr: 0.000021 - momentum: 0.000000
2023-10-13 20:59:23,431 epoch 4 - iter 1584/1984 - loss 0.06402360 - time (sec): 73.45 - samples/sec: 1769.42 - lr: 0.000021 - momentum: 0.000000
2023-10-13 20:59:32,955 epoch 4 - iter 1782/1984 - loss 0.06275327 - time (sec): 82.97 - samples/sec: 1766.30 - lr: 0.000020 - momentum: 0.000000
2023-10-13 20:59:42,166 epoch 4 - iter 1980/1984 - loss 0.06390400 - time (sec): 92.18 - samples/sec: 1776.30 - lr: 0.000020 - momentum: 0.000000
2023-10-13 20:59:42,347 ----------------------------------------------------------------------------------------------------
2023-10-13 20:59:42,347 EPOCH 4 done: loss 0.0639 - lr: 0.000020
2023-10-13 20:59:46,341 DEV : loss 0.14689981937408447 - f1-score (micro avg)  0.7489
2023-10-13 20:59:46,360 saving best model
2023-10-13 20:59:46,969 ----------------------------------------------------------------------------------------------------
2023-10-13 20:59:56,191 epoch 5 - iter 198/1984 - loss 0.04771070 - time (sec): 9.22 - samples/sec: 1804.52 - lr: 0.000020 - momentum: 0.000000
2023-10-13 21:00:05,681 epoch 5 - iter 396/1984 - loss 0.04649118 - time (sec): 18.71 - samples/sec: 1761.13 - lr: 0.000019 - momentum: 0.000000
2023-10-13 21:00:15,005 epoch 5 - iter 594/1984 - loss 0.04834649 - time (sec): 28.03 - samples/sec: 1757.30 - lr: 0.000019 - momentum: 0.000000
2023-10-13 21:00:24,352 epoch 5 - iter 792/1984 - loss 0.04615015 - time (sec): 37.38 - samples/sec: 1783.56 - lr: 0.000019 - momentum: 0.000000
2023-10-13 21:00:33,340 epoch 5 - iter 990/1984 - loss 0.04648506 - time (sec): 46.37 - samples/sec: 1794.36 - lr: 0.000018 - momentum: 0.000000
2023-10-13 21:00:42,270 epoch 5 - iter 1188/1984 - loss 0.04468332 - time (sec): 55.30 - samples/sec: 1794.29 - lr: 0.000018 - momentum: 0.000000
2023-10-13 21:00:51,141 epoch 5 - iter 1386/1984 - loss 0.04510172 - time (sec): 64.17 - samples/sec: 1793.83 - lr: 0.000018 - momentum: 0.000000
2023-10-13 21:01:00,022 epoch 5 - iter 1584/1984 - loss 0.04472638 - time (sec): 73.05 - samples/sec: 1801.23 - lr: 0.000017 - momentum: 0.000000
2023-10-13 21:01:09,020 epoch 5 - iter 1782/1984 - loss 0.04570004 - time (sec): 82.05 - samples/sec: 1804.48 - lr: 0.000017 - momentum: 0.000000
2023-10-13 21:01:17,633 epoch 5 - iter 1980/1984 - loss 0.04591442 - time (sec): 90.66 - samples/sec: 1804.42 - lr: 0.000017 - momentum: 0.000000
2023-10-13 21:01:17,818 ----------------------------------------------------------------------------------------------------
2023-10-13 21:01:17,818 EPOCH 5 done: loss 0.0460 - lr: 0.000017
2023-10-13 21:01:21,249 DEV : loss 0.16444256901741028 - f1-score (micro avg)  0.7731
2023-10-13 21:01:21,269 saving best model
2023-10-13 21:01:21,874 ----------------------------------------------------------------------------------------------------
2023-10-13 21:01:30,982 epoch 6 - iter 198/1984 - loss 0.03290051 - time (sec): 9.11 - samples/sec: 1821.35 - lr: 0.000016 - momentum: 0.000000
2023-10-13 21:01:40,070 epoch 6 - iter 396/1984 - loss 0.03321950 - time (sec): 18.19 - samples/sec: 1858.59 - lr: 0.000016 - momentum: 0.000000
2023-10-13 21:01:49,239 epoch 6 - iter 594/1984 - loss 0.03311731 - time (sec): 27.36 - samples/sec: 1815.85 - lr: 0.000016 - momentum: 0.000000
2023-10-13 21:01:58,266 epoch 6 - iter 792/1984 - loss 0.03061596 - time (sec): 36.39 - samples/sec: 1801.60 - lr: 0.000015 - momentum: 0.000000
2023-10-13 21:02:07,399 epoch 6 - iter 990/1984 - loss 0.03289182 - time (sec): 45.52 - samples/sec: 1799.81 - lr: 0.000015 - momentum: 0.000000
2023-10-13 21:02:16,317 epoch 6 - iter 1188/1984 - loss 0.03218297 - time (sec): 54.44 - samples/sec: 1807.03 - lr: 0.000015 - momentum: 0.000000
2023-10-13 21:02:25,287 epoch 6 - iter 1386/1984 - loss 0.03348283 - time (sec): 63.41 - samples/sec: 1815.90 - lr: 0.000014 - momentum: 0.000000
2023-10-13 21:02:34,604 epoch 6 - iter 1584/1984 - loss 0.03361440 - time (sec): 72.73 - samples/sec: 1807.15 - lr: 0.000014 - momentum: 0.000000
2023-10-13 21:02:43,949 epoch 6 - iter 1782/1984 - loss 0.03357184 - time (sec): 82.07 - samples/sec: 1796.19 - lr: 0.000014 - momentum: 0.000000
2023-10-13 21:02:53,471 epoch 6 - iter 1980/1984 - loss 0.03361402 - time (sec): 91.59 - samples/sec: 1786.91 - lr: 0.000013 - momentum: 0.000000
2023-10-13 21:02:53,648 ----------------------------------------------------------------------------------------------------
2023-10-13 21:02:53,648 EPOCH 6 done: loss 0.0336 - lr: 0.000013
2023-10-13 21:02:57,584 DEV : loss 0.20081490278244019 - f1-score (micro avg)  0.7536
2023-10-13 21:02:57,604 ----------------------------------------------------------------------------------------------------
2023-10-13 21:03:06,697 epoch 7 - iter 198/1984 - loss 0.02416311 - time (sec): 9.09 - samples/sec: 1689.96 - lr: 0.000013 - momentum: 0.000000
2023-10-13 21:03:15,884 epoch 7 - iter 396/1984 - loss 0.02612239 - time (sec): 18.28 - samples/sec: 1761.83 - lr: 0.000013 - momentum: 0.000000
2023-10-13 21:03:24,877 epoch 7 - iter 594/1984 - loss 0.02503909 - time (sec): 27.27 - samples/sec: 1752.06 - lr: 0.000012 - momentum: 0.000000
2023-10-13 21:03:33,834 epoch 7 - iter 792/1984 - loss 0.02544103 - time (sec): 36.23 - samples/sec: 1765.44 - lr: 0.000012 - momentum: 0.000000
2023-10-13 21:03:42,811 epoch 7 - iter 990/1984 - loss 0.02481477 - time (sec): 45.21 - samples/sec: 1795.09 - lr: 0.000012 - momentum: 0.000000
2023-10-13 21:03:52,055 epoch 7 - iter 1188/1984 - loss 0.02586965 - time (sec): 54.45 - samples/sec: 1808.49 - lr: 0.000011 - momentum: 0.000000
2023-10-13 21:04:01,424 epoch 7 - iter 1386/1984 - loss 0.02624069 - time (sec): 63.82 - samples/sec: 1781.30 - lr: 0.000011 - momentum: 0.000000
2023-10-13 21:04:10,658 epoch 7 - iter 1584/1984 - loss 0.02564693 - time (sec): 73.05 - samples/sec: 1788.89 - lr: 0.000011 - momentum: 0.000000
2023-10-13 21:04:19,928 epoch 7 - iter 1782/1984 - loss 0.02628502 - time (sec): 82.32 - samples/sec: 1786.06 - lr: 0.000010 - momentum: 0.000000
2023-10-13 21:04:29,051 epoch 7 - iter 1980/1984 - loss 0.02588685 - time (sec): 91.45 - samples/sec: 1788.99 - lr: 0.000010 - momentum: 0.000000
2023-10-13 21:04:29,240 ----------------------------------------------------------------------------------------------------
2023-10-13 21:04:29,240 EPOCH 7 done: loss 0.0258 - lr: 0.000010
2023-10-13 21:04:32,591 DEV : loss 0.19746360182762146 - f1-score (micro avg)  0.7682
2023-10-13 21:04:32,611 ----------------------------------------------------------------------------------------------------
2023-10-13 21:04:42,128 epoch 8 - iter 198/1984 - loss 0.01260002 - time (sec): 9.52 - samples/sec: 1716.88 - lr: 0.000010 - momentum: 0.000000
2023-10-13 21:04:51,190 epoch 8 - iter 396/1984 - loss 0.01660345 - time (sec): 18.58 - samples/sec: 1773.92 - lr: 0.000009 - momentum: 0.000000
2023-10-13 21:05:00,514 epoch 8 - iter 594/1984 - loss 0.01395853 - time (sec): 27.90 - samples/sec: 1750.34 - lr: 0.000009 - momentum: 0.000000
2023-10-13 21:05:09,660 epoch 8 - iter 792/1984 - loss 0.01619325 - time (sec): 37.05 - samples/sec: 1776.28 - lr: 0.000009 - momentum: 0.000000
2023-10-13 21:05:18,644 epoch 8 - iter 990/1984 - loss 0.01676123 - time (sec): 46.03 - samples/sec: 1778.63 - lr: 0.000008 - momentum: 0.000000
2023-10-13 21:05:27,517 epoch 8 - iter 1188/1984 - loss 0.01715802 - time (sec): 54.91 - samples/sec: 1771.93 - lr: 0.000008 - momentum: 0.000000
2023-10-13 21:05:36,488 epoch 8 - iter 1386/1984 - loss 0.01772604 - time (sec): 63.88 - samples/sec: 1784.87 - lr: 0.000008 - momentum: 0.000000
2023-10-13 21:05:45,810 epoch 8 - iter 1584/1984 - loss 0.01778083 - time (sec): 73.20 - samples/sec: 1797.68 - lr: 0.000007 - momentum: 0.000000
2023-10-13 21:05:54,770 epoch 8 - iter 1782/1984 - loss 0.01759045 - time (sec): 82.16 - samples/sec: 1794.86 - lr: 0.000007 - momentum: 0.000000
2023-10-13 21:06:04,075 epoch 8 - iter 1980/1984 - loss 0.01768193 - time (sec): 91.46 - samples/sec: 1790.18 - lr: 0.000007 - momentum: 0.000000
2023-10-13 21:06:04,255 ----------------------------------------------------------------------------------------------------
2023-10-13 21:06:04,255 EPOCH 8 done: loss 0.0177 - lr: 0.000007
2023-10-13 21:06:07,624 DEV : loss 0.2036585807800293 - f1-score (micro avg)  0.7669
2023-10-13 21:06:07,644 ----------------------------------------------------------------------------------------------------
2023-10-13 21:06:16,928 epoch 9 - iter 198/1984 - loss 0.01121945 - time (sec): 9.28 - samples/sec: 1803.90 - lr: 0.000006 - momentum: 0.000000
2023-10-13 21:06:25,941 epoch 9 - iter 396/1984 - loss 0.00942507 - time (sec): 18.30 - samples/sec: 1791.88 - lr: 0.000006 - momentum: 0.000000
2023-10-13 21:06:34,795 epoch 9 - iter 594/1984 - loss 0.01115411 - time (sec): 27.15 - samples/sec: 1791.55 - lr: 0.000006 - momentum: 0.000000
2023-10-13 21:06:43,917 epoch 9 - iter 792/1984 - loss 0.01115229 - time (sec): 36.27 - samples/sec: 1813.41 - lr: 0.000005 - momentum: 0.000000
2023-10-13 21:06:53,144 epoch 9 - iter 990/1984 - loss 0.01139535 - time (sec): 45.50 - samples/sec: 1813.79 - lr: 0.000005 - momentum: 0.000000
2023-10-13 21:07:02,159 epoch 9 - iter 1188/1984 - loss 0.01109025 - time (sec): 54.51 - samples/sec: 1799.43 - lr: 0.000005 - momentum: 0.000000
2023-10-13 21:07:11,471 epoch 9 - iter 1386/1984 - loss 0.01120806 - time (sec): 63.83 - samples/sec: 1788.63 - lr: 0.000004 - momentum: 0.000000
2023-10-13 21:07:20,727 epoch 9 - iter 1584/1984 - loss 0.01155852 - time (sec): 73.08 - samples/sec: 1792.50 - lr: 0.000004 - momentum: 0.000000
2023-10-13 21:07:29,843 epoch 9 - iter 1782/1984 - loss 0.01241535 - time (sec): 82.20 - samples/sec: 1798.57 - lr: 0.000004 - momentum: 0.000000
2023-10-13 21:07:38,723 epoch 9 - iter 1980/1984 - loss 0.01230984 - time (sec): 91.08 - samples/sec: 1796.01 - lr: 0.000003 - momentum: 0.000000
2023-10-13 21:07:38,909 ----------------------------------------------------------------------------------------------------
2023-10-13 21:07:38,909 EPOCH 9 done: loss 0.0123 - lr: 0.000003
2023-10-13 21:07:42,702 DEV : loss 0.21578332781791687 - f1-score (micro avg)  0.7644
2023-10-13 21:07:42,722 ----------------------------------------------------------------------------------------------------
2023-10-13 21:07:52,175 epoch 10 - iter 198/1984 - loss 0.01257622 - time (sec): 9.45 - samples/sec: 1803.26 - lr: 0.000003 - momentum: 0.000000
2023-10-13 21:08:01,164 epoch 10 - iter 396/1984 - loss 0.01088765 - time (sec): 18.44 - samples/sec: 1777.14 - lr: 0.000003 - momentum: 0.000000
2023-10-13 21:08:10,229 epoch 10 - iter 594/1984 - loss 0.00982544 - time (sec): 27.51 - samples/sec: 1774.23 - lr: 0.000002 - momentum: 0.000000
2023-10-13 21:08:19,951 epoch 10 - iter 792/1984 - loss 0.00945824 - time (sec): 37.23 - samples/sec: 1758.02 - lr: 0.000002 - momentum: 0.000000
2023-10-13 21:08:29,388 epoch 10 - iter 990/1984 - loss 0.00969098 - time (sec): 46.66 - samples/sec: 1746.07 - lr: 0.000002 - momentum: 0.000000
2023-10-13 21:08:38,684 epoch 10 - iter 1188/1984 - loss 0.00953138 - time (sec): 55.96 - samples/sec: 1750.67 - lr: 0.000001 - momentum: 0.000000
2023-10-13 21:08:47,952 epoch 10 - iter 1386/1984 - loss 0.00928403 - time (sec): 65.23 - samples/sec: 1752.64 - lr: 0.000001 - momentum: 0.000000
2023-10-13 21:08:56,924 epoch 10 - iter 1584/1984 - loss 0.00941987 - time (sec): 74.20 - samples/sec: 1764.57 - lr: 0.000001 - momentum: 0.000000
2023-10-13 21:09:06,116 epoch 10 - iter 1782/1984 - loss 0.00912027 - time (sec): 83.39 - samples/sec: 1774.72 - lr: 0.000000 - momentum: 0.000000
2023-10-13 21:09:15,364 epoch 10 - iter 1980/1984 - loss 0.00877729 - time (sec): 92.64 - samples/sec: 1766.45 - lr: 0.000000 - momentum: 0.000000
2023-10-13 21:09:15,565 ----------------------------------------------------------------------------------------------------
2023-10-13 21:09:15,565 EPOCH 10 done: loss 0.0088 - lr: 0.000000
2023-10-13 21:09:18,982 DEV : loss 0.22608202695846558 - f1-score (micro avg)  0.767
2023-10-13 21:09:19,478 ----------------------------------------------------------------------------------------------------
2023-10-13 21:09:19,479 Loading model from best epoch ...
2023-10-13 21:09:21,398 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG
2023-10-13 21:09:24,698 
Results:
- F-score (micro) 0.7846
- F-score (macro) 0.6793
- Accuracy 0.6641

By class:
              precision    recall  f1-score   support

         LOC     0.8554    0.8489    0.8521       655
         PER     0.7449    0.8117    0.7768       223
         ORG     0.4694    0.3622    0.4089       127

   micro avg     0.7901    0.7791    0.7846      1005
   macro avg     0.6899    0.6742    0.6793      1005
weighted avg     0.7821    0.7791    0.7794      1005

2023-10-13 21:09:24,699 ----------------------------------------------------------------------------------------------------