stefan-it commited on
Commit
35e202b
·
1 Parent(s): 3baac7f

Upload folder using huggingface_hub

Browse files
best-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f55a50a185754f70852d874145cdc5f660d11d1c76fbdf64691bffe2ce4f0e8
3
+ size 870793839
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:676d28b8f4620bae348c185612a396ef58cf3b1933a994c24d2d111e9a43d98e
3
+ size 870793956
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 14:58:53 0.0001 0.6239 0.1172 0.5034 0.6831 0.5796 0.4207
3
+ 2 15:16:51 0.0001 0.0909 0.1110 0.5634 0.6911 0.6208 0.4593
4
+ 3 15:34:42 0.0001 0.0648 0.1631 0.5482 0.7025 0.6158 0.4538
5
+ 4 15:52:18 0.0001 0.0458 0.2121 0.5599 0.7963 0.6575 0.5014
6
+ 5 16:09:49 0.0001 0.0306 0.2789 0.5407 0.7071 0.6128 0.4504
7
+ 6 16:27:20 0.0001 0.0212 0.3139 0.5503 0.7323 0.6284 0.4672
8
+ 7 16:45:30 0.0001 0.0158 0.3258 0.5606 0.7094 0.6263 0.4665
9
+ 8 17:03:08 0.0000 0.0097 0.3519 0.5533 0.7540 0.6383 0.4779
10
+ 9 17:21:13 0.0000 0.0072 0.3742 0.5560 0.7609 0.6425 0.4826
11
+ 10 17:38:48 0.0000 0.0045 0.3830 0.5539 0.7586 0.6403 0.4808
runs/events.out.tfevents.1697294459.c8b2203b18a8.2923.14 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:324afaf79fbee2310838ac9fd03f1e505ce05565c49b95b407b63a64364ab787
3
+ size 2030580
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2023-10-14 14:40:59,478 ----------------------------------------------------------------------------------------------------
2
+ 2023-10-14 14:40:59,480 Model: "SequenceTagger(
3
+ (embeddings): ByT5Embeddings(
4
+ (model): T5EncoderModel(
5
+ (shared): Embedding(384, 1472)
6
+ (encoder): T5Stack(
7
+ (embed_tokens): Embedding(384, 1472)
8
+ (block): ModuleList(
9
+ (0): T5Block(
10
+ (layer): ModuleList(
11
+ (0): T5LayerSelfAttention(
12
+ (SelfAttention): T5Attention(
13
+ (q): Linear(in_features=1472, out_features=384, bias=False)
14
+ (k): Linear(in_features=1472, out_features=384, bias=False)
15
+ (v): Linear(in_features=1472, out_features=384, bias=False)
16
+ (o): Linear(in_features=384, out_features=1472, bias=False)
17
+ (relative_attention_bias): Embedding(32, 6)
18
+ )
19
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (1): T5LayerFF(
23
+ (DenseReluDense): T5DenseGatedActDense(
24
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
25
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
26
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
27
+ (dropout): Dropout(p=0.1, inplace=False)
28
+ (act): NewGELUActivation()
29
+ )
30
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
31
+ (dropout): Dropout(p=0.1, inplace=False)
32
+ )
33
+ )
34
+ )
35
+ (1-11): 11 x T5Block(
36
+ (layer): ModuleList(
37
+ (0): T5LayerSelfAttention(
38
+ (SelfAttention): T5Attention(
39
+ (q): Linear(in_features=1472, out_features=384, bias=False)
40
+ (k): Linear(in_features=1472, out_features=384, bias=False)
41
+ (v): Linear(in_features=1472, out_features=384, bias=False)
42
+ (o): Linear(in_features=384, out_features=1472, bias=False)
43
+ )
44
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
45
+ (dropout): Dropout(p=0.1, inplace=False)
46
+ )
47
+ (1): T5LayerFF(
48
+ (DenseReluDense): T5DenseGatedActDense(
49
+ (wi_0): Linear(in_features=1472, out_features=3584, bias=False)
50
+ (wi_1): Linear(in_features=1472, out_features=3584, bias=False)
51
+ (wo): Linear(in_features=3584, out_features=1472, bias=False)
52
+ (dropout): Dropout(p=0.1, inplace=False)
53
+ (act): NewGELUActivation()
54
+ )
55
+ (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
56
+ (dropout): Dropout(p=0.1, inplace=False)
57
+ )
58
+ )
59
+ )
60
+ )
61
+ (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True)
62
+ (dropout): Dropout(p=0.1, inplace=False)
63
+ )
64
+ )
65
+ )
66
+ (locked_dropout): LockedDropout(p=0.5)
67
+ (linear): Linear(in_features=1472, out_features=13, bias=True)
68
+ (loss_function): CrossEntropyLoss()
69
+ )"
70
+ 2023-10-14 14:40:59,481 ----------------------------------------------------------------------------------------------------
71
+ 2023-10-14 14:40:59,481 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences
72
+ - NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator
73
+ 2023-10-14 14:40:59,481 ----------------------------------------------------------------------------------------------------
74
+ 2023-10-14 14:40:59,481 Train: 14465 sentences
75
+ 2023-10-14 14:40:59,481 (train_with_dev=False, train_with_test=False)
76
+ 2023-10-14 14:40:59,481 ----------------------------------------------------------------------------------------------------
77
+ 2023-10-14 14:40:59,481 Training Params:
78
+ 2023-10-14 14:40:59,481 - learning_rate: "0.00015"
79
+ 2023-10-14 14:40:59,481 - mini_batch_size: "4"
80
+ 2023-10-14 14:40:59,481 - max_epochs: "10"
81
+ 2023-10-14 14:40:59,481 - shuffle: "True"
82
+ 2023-10-14 14:40:59,481 ----------------------------------------------------------------------------------------------------
83
+ 2023-10-14 14:40:59,481 Plugins:
84
+ 2023-10-14 14:40:59,481 - TensorboardLogger
85
+ 2023-10-14 14:40:59,482 - LinearScheduler | warmup_fraction: '0.1'
86
+ 2023-10-14 14:40:59,482 ----------------------------------------------------------------------------------------------------
87
+ 2023-10-14 14:40:59,482 Final evaluation on model from best epoch (best-model.pt)
88
+ 2023-10-14 14:40:59,482 - metric: "('micro avg', 'f1-score')"
89
+ 2023-10-14 14:40:59,482 ----------------------------------------------------------------------------------------------------
90
+ 2023-10-14 14:40:59,482 Computation:
91
+ 2023-10-14 14:40:59,482 - compute on device: cuda:0
92
+ 2023-10-14 14:40:59,482 - embedding storage: none
93
+ 2023-10-14 14:40:59,482 ----------------------------------------------------------------------------------------------------
94
+ 2023-10-14 14:40:59,482 Model training base path: "hmbench-letemps/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4"
95
+ 2023-10-14 14:40:59,482 ----------------------------------------------------------------------------------------------------
96
+ 2023-10-14 14:40:59,482 ----------------------------------------------------------------------------------------------------
97
+ 2023-10-14 14:40:59,482 Logging anything other than scalars to TensorBoard is currently not supported.
98
+ 2023-10-14 14:42:41,066 epoch 1 - iter 361/3617 - loss 2.50686900 - time (sec): 101.58 - samples/sec: 360.10 - lr: 0.000015 - momentum: 0.000000
99
+ 2023-10-14 14:44:22,406 epoch 1 - iter 722/3617 - loss 2.10752883 - time (sec): 202.92 - samples/sec: 369.39 - lr: 0.000030 - momentum: 0.000000
100
+ 2023-10-14 14:46:12,979 epoch 1 - iter 1083/3617 - loss 1.65683609 - time (sec): 313.49 - samples/sec: 363.76 - lr: 0.000045 - momentum: 0.000000
101
+ 2023-10-14 14:47:57,110 epoch 1 - iter 1444/3617 - loss 1.31501097 - time (sec): 417.63 - samples/sec: 364.72 - lr: 0.000060 - momentum: 0.000000
102
+ 2023-10-14 14:49:44,051 epoch 1 - iter 1805/3617 - loss 1.09620008 - time (sec): 524.57 - samples/sec: 361.95 - lr: 0.000075 - momentum: 0.000000
103
+ 2023-10-14 14:51:28,614 epoch 1 - iter 2166/3617 - loss 0.93978591 - time (sec): 629.13 - samples/sec: 364.30 - lr: 0.000090 - momentum: 0.000000
104
+ 2023-10-14 14:53:09,495 epoch 1 - iter 2527/3617 - loss 0.82889421 - time (sec): 730.01 - samples/sec: 366.34 - lr: 0.000105 - momentum: 0.000000
105
+ 2023-10-14 14:54:50,772 epoch 1 - iter 2888/3617 - loss 0.74534435 - time (sec): 831.29 - samples/sec: 367.53 - lr: 0.000120 - momentum: 0.000000
106
+ 2023-10-14 14:56:30,263 epoch 1 - iter 3249/3617 - loss 0.67952585 - time (sec): 930.78 - samples/sec: 367.24 - lr: 0.000135 - momentum: 0.000000
107
+ 2023-10-14 14:58:13,235 epoch 1 - iter 3610/3617 - loss 0.62441070 - time (sec): 1033.75 - samples/sec: 366.94 - lr: 0.000150 - momentum: 0.000000
108
+ 2023-10-14 14:58:14,910 ----------------------------------------------------------------------------------------------------
109
+ 2023-10-14 14:58:14,910 EPOCH 1 done: loss 0.6239 - lr: 0.000150
110
+ 2023-10-14 14:58:53,380 DEV : loss 0.11717528849840164 - f1-score (micro avg) 0.5796
111
+ 2023-10-14 14:58:53,438 saving best model
112
+ 2023-10-14 14:58:54,356 ----------------------------------------------------------------------------------------------------
113
+ 2023-10-14 15:00:36,235 epoch 2 - iter 361/3617 - loss 0.10878939 - time (sec): 101.88 - samples/sec: 370.35 - lr: 0.000148 - momentum: 0.000000
114
+ 2023-10-14 15:02:20,856 epoch 2 - iter 722/3617 - loss 0.10496323 - time (sec): 206.50 - samples/sec: 369.37 - lr: 0.000147 - momentum: 0.000000
115
+ 2023-10-14 15:04:10,294 epoch 2 - iter 1083/3617 - loss 0.10512196 - time (sec): 315.94 - samples/sec: 358.08 - lr: 0.000145 - momentum: 0.000000
116
+ 2023-10-14 15:05:52,218 epoch 2 - iter 1444/3617 - loss 0.10247324 - time (sec): 417.86 - samples/sec: 360.21 - lr: 0.000143 - momentum: 0.000000
117
+ 2023-10-14 15:07:32,739 epoch 2 - iter 1805/3617 - loss 0.10015258 - time (sec): 518.38 - samples/sec: 362.60 - lr: 0.000142 - momentum: 0.000000
118
+ 2023-10-14 15:09:12,203 epoch 2 - iter 2166/3617 - loss 0.09832530 - time (sec): 617.84 - samples/sec: 366.60 - lr: 0.000140 - momentum: 0.000000
119
+ 2023-10-14 15:10:55,993 epoch 2 - iter 2527/3617 - loss 0.09600180 - time (sec): 721.63 - samples/sec: 368.59 - lr: 0.000138 - momentum: 0.000000
120
+ 2023-10-14 15:12:42,577 epoch 2 - iter 2888/3617 - loss 0.09489153 - time (sec): 828.22 - samples/sec: 367.36 - lr: 0.000137 - momentum: 0.000000
121
+ 2023-10-14 15:14:25,871 epoch 2 - iter 3249/3617 - loss 0.09188276 - time (sec): 931.51 - samples/sec: 367.52 - lr: 0.000135 - momentum: 0.000000
122
+ 2023-10-14 15:16:09,934 epoch 2 - iter 3610/3617 - loss 0.09054350 - time (sec): 1035.58 - samples/sec: 366.37 - lr: 0.000133 - momentum: 0.000000
123
+ 2023-10-14 15:16:11,806 ----------------------------------------------------------------------------------------------------
124
+ 2023-10-14 15:16:11,807 EPOCH 2 done: loss 0.0909 - lr: 0.000133
125
+ 2023-10-14 15:16:51,726 DEV : loss 0.11103517562150955 - f1-score (micro avg) 0.6208
126
+ 2023-10-14 15:16:51,794 saving best model
127
+ 2023-10-14 15:16:57,345 ----------------------------------------------------------------------------------------------------
128
+ 2023-10-14 15:18:47,930 epoch 3 - iter 361/3617 - loss 0.06378214 - time (sec): 110.58 - samples/sec: 346.95 - lr: 0.000132 - momentum: 0.000000
129
+ 2023-10-14 15:20:27,753 epoch 3 - iter 722/3617 - loss 0.06540151 - time (sec): 210.40 - samples/sec: 356.29 - lr: 0.000130 - momentum: 0.000000
130
+ 2023-10-14 15:22:11,280 epoch 3 - iter 1083/3617 - loss 0.06698159 - time (sec): 313.93 - samples/sec: 359.05 - lr: 0.000128 - momentum: 0.000000
131
+ 2023-10-14 15:23:52,034 epoch 3 - iter 1444/3617 - loss 0.06526504 - time (sec): 414.68 - samples/sec: 366.94 - lr: 0.000127 - momentum: 0.000000
132
+ 2023-10-14 15:25:35,189 epoch 3 - iter 1805/3617 - loss 0.06525753 - time (sec): 517.84 - samples/sec: 368.07 - lr: 0.000125 - momentum: 0.000000
133
+ 2023-10-14 15:27:21,741 epoch 3 - iter 2166/3617 - loss 0.06493734 - time (sec): 624.39 - samples/sec: 367.06 - lr: 0.000123 - momentum: 0.000000
134
+ 2023-10-14 15:29:02,837 epoch 3 - iter 2527/3617 - loss 0.06477906 - time (sec): 725.49 - samples/sec: 368.41 - lr: 0.000122 - momentum: 0.000000
135
+ 2023-10-14 15:30:43,338 epoch 3 - iter 2888/3617 - loss 0.06486032 - time (sec): 825.99 - samples/sec: 367.73 - lr: 0.000120 - momentum: 0.000000
136
+ 2023-10-14 15:32:21,513 epoch 3 - iter 3249/3617 - loss 0.06429360 - time (sec): 924.16 - samples/sec: 369.65 - lr: 0.000118 - momentum: 0.000000
137
+ 2023-10-14 15:34:00,097 epoch 3 - iter 3610/3617 - loss 0.06473516 - time (sec): 1022.75 - samples/sec: 370.86 - lr: 0.000117 - momentum: 0.000000
138
+ 2023-10-14 15:34:01,771 ----------------------------------------------------------------------------------------------------
139
+ 2023-10-14 15:34:01,772 EPOCH 3 done: loss 0.0648 - lr: 0.000117
140
+ 2023-10-14 15:34:42,878 DEV : loss 0.1630961298942566 - f1-score (micro avg) 0.6158
141
+ 2023-10-14 15:34:42,948 ----------------------------------------------------------------------------------------------------
142
+ 2023-10-14 15:36:23,601 epoch 4 - iter 361/3617 - loss 0.04594123 - time (sec): 100.65 - samples/sec: 363.48 - lr: 0.000115 - momentum: 0.000000
143
+ 2023-10-14 15:38:03,903 epoch 4 - iter 722/3617 - loss 0.04874699 - time (sec): 200.95 - samples/sec: 370.90 - lr: 0.000113 - momentum: 0.000000
144
+ 2023-10-14 15:39:47,633 epoch 4 - iter 1083/3617 - loss 0.04891474 - time (sec): 304.68 - samples/sec: 372.99 - lr: 0.000112 - momentum: 0.000000
145
+ 2023-10-14 15:41:28,223 epoch 4 - iter 1444/3617 - loss 0.04690533 - time (sec): 405.27 - samples/sec: 370.54 - lr: 0.000110 - momentum: 0.000000
146
+ 2023-10-14 15:43:08,244 epoch 4 - iter 1805/3617 - loss 0.04706884 - time (sec): 505.29 - samples/sec: 371.67 - lr: 0.000108 - momentum: 0.000000
147
+ 2023-10-14 15:44:53,122 epoch 4 - iter 2166/3617 - loss 0.04732382 - time (sec): 610.17 - samples/sec: 371.22 - lr: 0.000107 - momentum: 0.000000
148
+ 2023-10-14 15:46:34,153 epoch 4 - iter 2527/3617 - loss 0.04755205 - time (sec): 711.20 - samples/sec: 372.43 - lr: 0.000105 - momentum: 0.000000
149
+ 2023-10-14 15:48:16,775 epoch 4 - iter 2888/3617 - loss 0.04692890 - time (sec): 813.82 - samples/sec: 373.54 - lr: 0.000103 - momentum: 0.000000
150
+ 2023-10-14 15:49:58,502 epoch 4 - iter 3249/3617 - loss 0.04641577 - time (sec): 915.55 - samples/sec: 372.42 - lr: 0.000102 - momentum: 0.000000
151
+ 2023-10-14 15:51:37,654 epoch 4 - iter 3610/3617 - loss 0.04580765 - time (sec): 1014.70 - samples/sec: 373.81 - lr: 0.000100 - momentum: 0.000000
152
+ 2023-10-14 15:51:39,377 ----------------------------------------------------------------------------------------------------
153
+ 2023-10-14 15:51:39,378 EPOCH 4 done: loss 0.0458 - lr: 0.000100
154
+ 2023-10-14 15:52:18,475 DEV : loss 0.21207064390182495 - f1-score (micro avg) 0.6575
155
+ 2023-10-14 15:52:18,532 saving best model
156
+ 2023-10-14 15:52:21,265 ----------------------------------------------------------------------------------------------------
157
+ 2023-10-14 15:53:59,089 epoch 5 - iter 361/3617 - loss 0.02339736 - time (sec): 97.82 - samples/sec: 384.30 - lr: 0.000098 - momentum: 0.000000
158
+ 2023-10-14 15:55:43,726 epoch 5 - iter 722/3617 - loss 0.02553085 - time (sec): 202.46 - samples/sec: 387.78 - lr: 0.000097 - momentum: 0.000000
159
+ 2023-10-14 15:57:28,602 epoch 5 - iter 1083/3617 - loss 0.02625521 - time (sec): 307.33 - samples/sec: 381.45 - lr: 0.000095 - momentum: 0.000000
160
+ 2023-10-14 15:59:07,613 epoch 5 - iter 1444/3617 - loss 0.02682998 - time (sec): 406.34 - samples/sec: 378.67 - lr: 0.000093 - momentum: 0.000000
161
+ 2023-10-14 16:00:54,191 epoch 5 - iter 1805/3617 - loss 0.02777093 - time (sec): 512.92 - samples/sec: 371.65 - lr: 0.000092 - momentum: 0.000000
162
+ 2023-10-14 16:02:34,896 epoch 5 - iter 2166/3617 - loss 0.02927279 - time (sec): 613.63 - samples/sec: 373.11 - lr: 0.000090 - momentum: 0.000000
163
+ 2023-10-14 16:04:12,788 epoch 5 - iter 2527/3617 - loss 0.02908364 - time (sec): 711.52 - samples/sec: 374.65 - lr: 0.000088 - momentum: 0.000000
164
+ 2023-10-14 16:05:50,506 epoch 5 - iter 2888/3617 - loss 0.02954096 - time (sec): 809.24 - samples/sec: 375.84 - lr: 0.000087 - momentum: 0.000000
165
+ 2023-10-14 16:07:30,269 epoch 5 - iter 3249/3617 - loss 0.03004065 - time (sec): 909.00 - samples/sec: 375.39 - lr: 0.000085 - momentum: 0.000000
166
+ 2023-10-14 16:09:09,000 epoch 5 - iter 3610/3617 - loss 0.03067994 - time (sec): 1007.73 - samples/sec: 376.30 - lr: 0.000083 - momentum: 0.000000
167
+ 2023-10-14 16:09:10,704 ----------------------------------------------------------------------------------------------------
168
+ 2023-10-14 16:09:10,705 EPOCH 5 done: loss 0.0306 - lr: 0.000083
169
+ 2023-10-14 16:09:49,418 DEV : loss 0.2788721024990082 - f1-score (micro avg) 0.6128
170
+ 2023-10-14 16:09:49,475 ----------------------------------------------------------------------------------------------------
171
+ 2023-10-14 16:11:32,473 epoch 6 - iter 361/3617 - loss 0.02077887 - time (sec): 103.00 - samples/sec: 381.48 - lr: 0.000082 - momentum: 0.000000
172
+ 2023-10-14 16:13:14,265 epoch 6 - iter 722/3617 - loss 0.02070222 - time (sec): 204.79 - samples/sec: 373.48 - lr: 0.000080 - momentum: 0.000000
173
+ 2023-10-14 16:14:54,315 epoch 6 - iter 1083/3617 - loss 0.01942465 - time (sec): 304.84 - samples/sec: 374.09 - lr: 0.000078 - momentum: 0.000000
174
+ 2023-10-14 16:16:32,568 epoch 6 - iter 1444/3617 - loss 0.02052837 - time (sec): 403.09 - samples/sec: 375.08 - lr: 0.000077 - momentum: 0.000000
175
+ 2023-10-14 16:18:15,025 epoch 6 - iter 1805/3617 - loss 0.02128468 - time (sec): 505.55 - samples/sec: 372.79 - lr: 0.000075 - momentum: 0.000000
176
+ 2023-10-14 16:19:57,182 epoch 6 - iter 2166/3617 - loss 0.02145412 - time (sec): 607.70 - samples/sec: 373.59 - lr: 0.000073 - momentum: 0.000000
177
+ 2023-10-14 16:21:37,405 epoch 6 - iter 2527/3617 - loss 0.02127559 - time (sec): 707.93 - samples/sec: 373.85 - lr: 0.000072 - momentum: 0.000000
178
+ 2023-10-14 16:23:17,748 epoch 6 - iter 2888/3617 - loss 0.02174407 - time (sec): 808.27 - samples/sec: 375.50 - lr: 0.000070 - momentum: 0.000000
179
+ 2023-10-14 16:24:58,379 epoch 6 - iter 3249/3617 - loss 0.02165837 - time (sec): 908.90 - samples/sec: 375.13 - lr: 0.000068 - momentum: 0.000000
180
+ 2023-10-14 16:26:39,103 epoch 6 - iter 3610/3617 - loss 0.02119253 - time (sec): 1009.63 - samples/sec: 375.80 - lr: 0.000067 - momentum: 0.000000
181
+ 2023-10-14 16:26:41,074 ----------------------------------------------------------------------------------------------------
182
+ 2023-10-14 16:26:41,074 EPOCH 6 done: loss 0.0212 - lr: 0.000067
183
+ 2023-10-14 16:27:20,798 DEV : loss 0.313930869102478 - f1-score (micro avg) 0.6284
184
+ 2023-10-14 16:27:20,855 ----------------------------------------------------------------------------------------------------
185
+ 2023-10-14 16:29:06,678 epoch 7 - iter 361/3617 - loss 0.01422485 - time (sec): 105.82 - samples/sec: 383.75 - lr: 0.000065 - momentum: 0.000000
186
+ 2023-10-14 16:30:54,879 epoch 7 - iter 722/3617 - loss 0.01422789 - time (sec): 214.02 - samples/sec: 363.13 - lr: 0.000063 - momentum: 0.000000
187
+ 2023-10-14 16:32:43,802 epoch 7 - iter 1083/3617 - loss 0.01393561 - time (sec): 322.94 - samples/sec: 357.57 - lr: 0.000062 - momentum: 0.000000
188
+ 2023-10-14 16:34:23,506 epoch 7 - iter 1444/3617 - loss 0.01434256 - time (sec): 422.65 - samples/sec: 361.08 - lr: 0.000060 - momentum: 0.000000
189
+ 2023-10-14 16:36:06,389 epoch 7 - iter 1805/3617 - loss 0.01398391 - time (sec): 525.53 - samples/sec: 361.44 - lr: 0.000058 - momentum: 0.000000
190
+ 2023-10-14 16:37:52,259 epoch 7 - iter 2166/3617 - loss 0.01422903 - time (sec): 631.40 - samples/sec: 360.88 - lr: 0.000057 - momentum: 0.000000
191
+ 2023-10-14 16:39:35,485 epoch 7 - iter 2527/3617 - loss 0.01414726 - time (sec): 734.63 - samples/sec: 362.33 - lr: 0.000055 - momentum: 0.000000
192
+ 2023-10-14 16:41:19,107 epoch 7 - iter 2888/3617 - loss 0.01450497 - time (sec): 838.25 - samples/sec: 364.58 - lr: 0.000053 - momentum: 0.000000
193
+ 2023-10-14 16:43:02,248 epoch 7 - iter 3249/3617 - loss 0.01550406 - time (sec): 941.39 - samples/sec: 364.18 - lr: 0.000052 - momentum: 0.000000
194
+ 2023-10-14 16:44:48,987 epoch 7 - iter 3610/3617 - loss 0.01571456 - time (sec): 1048.13 - samples/sec: 361.60 - lr: 0.000050 - momentum: 0.000000
195
+ 2023-10-14 16:44:51,074 ----------------------------------------------------------------------------------------------------
196
+ 2023-10-14 16:44:51,074 EPOCH 7 done: loss 0.0158 - lr: 0.000050
197
+ 2023-10-14 16:45:30,018 DEV : loss 0.3257623016834259 - f1-score (micro avg) 0.6263
198
+ 2023-10-14 16:45:30,075 ----------------------------------------------------------------------------------------------------
199
+ 2023-10-14 16:47:09,244 epoch 8 - iter 361/3617 - loss 0.01142318 - time (sec): 99.17 - samples/sec: 387.29 - lr: 0.000048 - momentum: 0.000000
200
+ 2023-10-14 16:48:57,539 epoch 8 - iter 722/3617 - loss 0.01106006 - time (sec): 207.46 - samples/sec: 374.10 - lr: 0.000047 - momentum: 0.000000
201
+ 2023-10-14 16:50:47,572 epoch 8 - iter 1083/3617 - loss 0.01148474 - time (sec): 317.49 - samples/sec: 363.91 - lr: 0.000045 - momentum: 0.000000
202
+ 2023-10-14 16:52:29,277 epoch 8 - iter 1444/3617 - loss 0.01109820 - time (sec): 419.20 - samples/sec: 363.08 - lr: 0.000043 - momentum: 0.000000
203
+ 2023-10-14 16:54:07,215 epoch 8 - iter 1805/3617 - loss 0.01060742 - time (sec): 517.14 - samples/sec: 368.74 - lr: 0.000042 - momentum: 0.000000
204
+ 2023-10-14 16:55:45,728 epoch 8 - iter 2166/3617 - loss 0.01013602 - time (sec): 615.65 - samples/sec: 368.66 - lr: 0.000040 - momentum: 0.000000
205
+ 2023-10-14 16:57:28,600 epoch 8 - iter 2527/3617 - loss 0.01003184 - time (sec): 718.52 - samples/sec: 370.30 - lr: 0.000038 - momentum: 0.000000
206
+ 2023-10-14 16:59:08,971 epoch 8 - iter 2888/3617 - loss 0.01030870 - time (sec): 818.89 - samples/sec: 370.18 - lr: 0.000037 - momentum: 0.000000
207
+ 2023-10-14 17:00:48,363 epoch 8 - iter 3249/3617 - loss 0.01013006 - time (sec): 918.29 - samples/sec: 371.85 - lr: 0.000035 - momentum: 0.000000
208
+ 2023-10-14 17:02:27,074 epoch 8 - iter 3610/3617 - loss 0.00973135 - time (sec): 1017.00 - samples/sec: 372.91 - lr: 0.000033 - momentum: 0.000000
209
+ 2023-10-14 17:02:28,759 ----------------------------------------------------------------------------------------------------
210
+ 2023-10-14 17:02:28,759 EPOCH 8 done: loss 0.0097 - lr: 0.000033
211
+ 2023-10-14 17:03:08,171 DEV : loss 0.3519401252269745 - f1-score (micro avg) 0.6383
212
+ 2023-10-14 17:03:08,238 ----------------------------------------------------------------------------------------------------
213
+ 2023-10-14 17:04:56,121 epoch 9 - iter 361/3617 - loss 0.00400351 - time (sec): 107.88 - samples/sec: 337.43 - lr: 0.000032 - momentum: 0.000000
214
+ 2023-10-14 17:06:44,549 epoch 9 - iter 722/3617 - loss 0.00553294 - time (sec): 216.31 - samples/sec: 337.08 - lr: 0.000030 - momentum: 0.000000
215
+ 2023-10-14 17:08:26,029 epoch 9 - iter 1083/3617 - loss 0.00718701 - time (sec): 317.79 - samples/sec: 350.01 - lr: 0.000028 - momentum: 0.000000
216
+ 2023-10-14 17:10:14,915 epoch 9 - iter 1444/3617 - loss 0.00747014 - time (sec): 426.67 - samples/sec: 351.11 - lr: 0.000027 - momentum: 0.000000
217
+ 2023-10-14 17:12:13,127 epoch 9 - iter 1805/3617 - loss 0.00700743 - time (sec): 544.89 - samples/sec: 346.36 - lr: 0.000025 - momentum: 0.000000
218
+ 2023-10-14 17:13:56,710 epoch 9 - iter 2166/3617 - loss 0.00753126 - time (sec): 648.47 - samples/sec: 348.52 - lr: 0.000023 - momentum: 0.000000
219
+ 2023-10-14 17:15:35,725 epoch 9 - iter 2527/3617 - loss 0.00713809 - time (sec): 747.48 - samples/sec: 353.09 - lr: 0.000022 - momentum: 0.000000
220
+ 2023-10-14 17:17:15,835 epoch 9 - iter 2888/3617 - loss 0.00710439 - time (sec): 847.60 - samples/sec: 358.03 - lr: 0.000020 - momentum: 0.000000
221
+ 2023-10-14 17:18:53,848 epoch 9 - iter 3249/3617 - loss 0.00691053 - time (sec): 945.61 - samples/sec: 361.20 - lr: 0.000018 - momentum: 0.000000
222
+ 2023-10-14 17:20:32,410 epoch 9 - iter 3610/3617 - loss 0.00723383 - time (sec): 1044.17 - samples/sec: 363.12 - lr: 0.000017 - momentum: 0.000000
223
+ 2023-10-14 17:20:34,181 ----------------------------------------------------------------------------------------------------
224
+ 2023-10-14 17:20:34,181 EPOCH 9 done: loss 0.0072 - lr: 0.000017
225
+ 2023-10-14 17:21:13,627 DEV : loss 0.37418004870414734 - f1-score (micro avg) 0.6425
226
+ 2023-10-14 17:21:13,685 ----------------------------------------------------------------------------------------------------
227
+ 2023-10-14 17:22:51,217 epoch 10 - iter 361/3617 - loss 0.00288085 - time (sec): 97.53 - samples/sec: 383.84 - lr: 0.000015 - momentum: 0.000000
228
+ 2023-10-14 17:24:32,451 epoch 10 - iter 722/3617 - loss 0.00430136 - time (sec): 198.76 - samples/sec: 381.28 - lr: 0.000013 - momentum: 0.000000
229
+ 2023-10-14 17:26:18,156 epoch 10 - iter 1083/3617 - loss 0.00516123 - time (sec): 304.47 - samples/sec: 376.22 - lr: 0.000012 - momentum: 0.000000
230
+ 2023-10-14 17:27:59,531 epoch 10 - iter 1444/3617 - loss 0.00455380 - time (sec): 405.84 - samples/sec: 374.69 - lr: 0.000010 - momentum: 0.000000
231
+ 2023-10-14 17:29:41,578 epoch 10 - iter 1805/3617 - loss 0.00417121 - time (sec): 507.89 - samples/sec: 373.65 - lr: 0.000008 - momentum: 0.000000
232
+ 2023-10-14 17:31:24,980 epoch 10 - iter 2166/3617 - loss 0.00427900 - time (sec): 611.29 - samples/sec: 373.25 - lr: 0.000007 - momentum: 0.000000
233
+ 2023-10-14 17:33:04,407 epoch 10 - iter 2527/3617 - loss 0.00423939 - time (sec): 710.72 - samples/sec: 374.13 - lr: 0.000005 - momentum: 0.000000
234
+ 2023-10-14 17:34:44,494 epoch 10 - iter 2888/3617 - loss 0.00423096 - time (sec): 810.81 - samples/sec: 376.32 - lr: 0.000003 - momentum: 0.000000
235
+ 2023-10-14 17:36:23,538 epoch 10 - iter 3249/3617 - loss 0.00456365 - time (sec): 909.85 - samples/sec: 376.25 - lr: 0.000002 - momentum: 0.000000
236
+ 2023-10-14 17:38:04,514 epoch 10 - iter 3610/3617 - loss 0.00445018 - time (sec): 1010.83 - samples/sec: 375.28 - lr: 0.000000 - momentum: 0.000000
237
+ 2023-10-14 17:38:06,359 ----------------------------------------------------------------------------------------------------
238
+ 2023-10-14 17:38:06,359 EPOCH 10 done: loss 0.0045 - lr: 0.000000
239
+ 2023-10-14 17:38:48,489 DEV : loss 0.383007675409317 - f1-score (micro avg) 0.6403
240
+ 2023-10-14 17:38:49,482 ----------------------------------------------------------------------------------------------------
241
+ 2023-10-14 17:38:49,484 Loading model from best epoch ...
242
+ 2023-10-14 17:38:53,352 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org
243
+ 2023-10-14 17:39:53,980
244
+ Results:
245
+ - F-score (micro) 0.6356
246
+ - F-score (macro) 0.4981
247
+ - Accuracy 0.4788
248
+
249
+ By class:
250
+ precision recall f1-score support
251
+
252
+ loc 0.6276 0.7699 0.6915 591
253
+ pers 0.5664 0.7171 0.6329 357
254
+ org 0.1757 0.1646 0.1699 79
255
+
256
+ micro avg 0.5787 0.7050 0.6356 1027
257
+ macro avg 0.4565 0.5505 0.4981 1027
258
+ weighted avg 0.5715 0.7050 0.6310 1027
259
+
260
+ 2023-10-14 17:39:53,980 ----------------------------------------------------------------------------------------------------