|
2023-10-11 11:07:50,584 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:07:50,586 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=17, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-11 11:07:50,587 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:07:50,587 MultiCorpus: 1085 train + 148 dev + 364 test sentences |
|
- NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator |
|
2023-10-11 11:07:50,587 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:07:50,587 Train: 1085 sentences |
|
2023-10-11 11:07:50,587 (train_with_dev=False, train_with_test=False) |
|
2023-10-11 11:07:50,587 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:07:50,587 Training Params: |
|
2023-10-11 11:07:50,587 - learning_rate: "0.00016" |
|
2023-10-11 11:07:50,587 - mini_batch_size: "4" |
|
2023-10-11 11:07:50,587 - max_epochs: "10" |
|
2023-10-11 11:07:50,587 - shuffle: "True" |
|
2023-10-11 11:07:50,588 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:07:50,588 Plugins: |
|
2023-10-11 11:07:50,588 - TensorboardLogger |
|
2023-10-11 11:07:50,588 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-11 11:07:50,588 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:07:50,588 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-11 11:07:50,588 - metric: "('micro avg', 'f1-score')" |
|
2023-10-11 11:07:50,588 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:07:50,588 Computation: |
|
2023-10-11 11:07:50,588 - compute on device: cuda:0 |
|
2023-10-11 11:07:50,588 - embedding storage: none |
|
2023-10-11 11:07:50,588 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:07:50,588 Model training base path: "hmbench-newseye/sv-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" |
|
2023-10-11 11:07:50,588 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:07:50,588 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:07:50,589 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-11 11:08:00,786 epoch 1 - iter 27/272 - loss 2.82642878 - time (sec): 10.20 - samples/sec: 564.33 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-11 11:08:10,540 epoch 1 - iter 54/272 - loss 2.81647757 - time (sec): 19.95 - samples/sec: 561.36 - lr: 0.000031 - momentum: 0.000000 |
|
2023-10-11 11:08:20,157 epoch 1 - iter 81/272 - loss 2.79293552 - time (sec): 29.57 - samples/sec: 554.92 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-11 11:08:29,853 epoch 1 - iter 108/272 - loss 2.74216787 - time (sec): 39.26 - samples/sec: 547.45 - lr: 0.000063 - momentum: 0.000000 |
|
2023-10-11 11:08:39,436 epoch 1 - iter 135/272 - loss 2.65162236 - time (sec): 48.85 - samples/sec: 548.73 - lr: 0.000079 - momentum: 0.000000 |
|
2023-10-11 11:08:48,405 epoch 1 - iter 162/272 - loss 2.56580157 - time (sec): 57.82 - samples/sec: 543.49 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-11 11:08:57,372 epoch 1 - iter 189/272 - loss 2.46639280 - time (sec): 66.78 - samples/sec: 537.20 - lr: 0.000111 - momentum: 0.000000 |
|
2023-10-11 11:09:07,456 epoch 1 - iter 216/272 - loss 2.33252277 - time (sec): 76.87 - samples/sec: 541.51 - lr: 0.000126 - momentum: 0.000000 |
|
2023-10-11 11:09:16,984 epoch 1 - iter 243/272 - loss 2.21102811 - time (sec): 86.39 - samples/sec: 538.76 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 11:09:26,631 epoch 1 - iter 270/272 - loss 2.08982352 - time (sec): 96.04 - samples/sec: 537.93 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-11 11:09:27,187 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:09:27,187 EPOCH 1 done: loss 2.0814 - lr: 0.000158 |
|
2023-10-11 11:09:32,503 DEV : loss 0.751204788684845 - f1-score (micro avg) 0.0 |
|
2023-10-11 11:09:32,512 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:09:42,689 epoch 2 - iter 27/272 - loss 0.70958275 - time (sec): 10.18 - samples/sec: 573.25 - lr: 0.000158 - momentum: 0.000000 |
|
2023-10-11 11:09:52,318 epoch 2 - iter 54/272 - loss 0.68509310 - time (sec): 19.80 - samples/sec: 568.82 - lr: 0.000157 - momentum: 0.000000 |
|
2023-10-11 11:10:01,642 epoch 2 - iter 81/272 - loss 0.66861733 - time (sec): 29.13 - samples/sec: 562.93 - lr: 0.000155 - momentum: 0.000000 |
|
2023-10-11 11:10:10,552 epoch 2 - iter 108/272 - loss 0.63700588 - time (sec): 38.04 - samples/sec: 553.57 - lr: 0.000153 - momentum: 0.000000 |
|
2023-10-11 11:10:19,839 epoch 2 - iter 135/272 - loss 0.59504244 - time (sec): 47.33 - samples/sec: 552.75 - lr: 0.000151 - momentum: 0.000000 |
|
2023-10-11 11:10:28,596 epoch 2 - iter 162/272 - loss 0.57110271 - time (sec): 56.08 - samples/sec: 544.08 - lr: 0.000149 - momentum: 0.000000 |
|
2023-10-11 11:10:37,826 epoch 2 - iter 189/272 - loss 0.53925735 - time (sec): 65.31 - samples/sec: 540.45 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-11 11:10:47,077 epoch 2 - iter 216/272 - loss 0.52341224 - time (sec): 74.56 - samples/sec: 540.39 - lr: 0.000146 - momentum: 0.000000 |
|
2023-10-11 11:10:56,423 epoch 2 - iter 243/272 - loss 0.50597516 - time (sec): 83.91 - samples/sec: 543.46 - lr: 0.000144 - momentum: 0.000000 |
|
2023-10-11 11:11:06,983 epoch 2 - iter 270/272 - loss 0.48741426 - time (sec): 94.47 - samples/sec: 547.54 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-11 11:11:07,491 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:11:07,492 EPOCH 2 done: loss 0.4853 - lr: 0.000142 |
|
2023-10-11 11:11:13,160 DEV : loss 0.2761920094490051 - f1-score (micro avg) 0.3163 |
|
2023-10-11 11:11:13,169 saving best model |
|
2023-10-11 11:11:14,262 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:11:23,631 epoch 3 - iter 27/272 - loss 0.33511650 - time (sec): 9.37 - samples/sec: 545.23 - lr: 0.000141 - momentum: 0.000000 |
|
2023-10-11 11:11:33,244 epoch 3 - iter 54/272 - loss 0.30965757 - time (sec): 18.98 - samples/sec: 557.62 - lr: 0.000139 - momentum: 0.000000 |
|
2023-10-11 11:11:43,271 epoch 3 - iter 81/272 - loss 0.30790871 - time (sec): 29.01 - samples/sec: 574.62 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-11 11:11:52,030 epoch 3 - iter 108/272 - loss 0.31732225 - time (sec): 37.77 - samples/sec: 556.73 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-11 11:12:01,014 epoch 3 - iter 135/272 - loss 0.31924782 - time (sec): 46.75 - samples/sec: 551.63 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-11 11:12:11,223 epoch 3 - iter 162/272 - loss 0.30738277 - time (sec): 56.96 - samples/sec: 557.71 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-11 11:12:20,613 epoch 3 - iter 189/272 - loss 0.29568838 - time (sec): 66.35 - samples/sec: 556.24 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-11 11:12:29,885 epoch 3 - iter 216/272 - loss 0.28129595 - time (sec): 75.62 - samples/sec: 554.44 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-11 11:12:38,783 epoch 3 - iter 243/272 - loss 0.27144926 - time (sec): 84.52 - samples/sec: 552.54 - lr: 0.000126 - momentum: 0.000000 |
|
2023-10-11 11:12:47,953 epoch 3 - iter 270/272 - loss 0.27109563 - time (sec): 93.69 - samples/sec: 552.59 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-11 11:12:48,398 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:12:48,399 EPOCH 3 done: loss 0.2705 - lr: 0.000125 |
|
2023-10-11 11:12:53,851 DEV : loss 0.20764292776584625 - f1-score (micro avg) 0.5838 |
|
2023-10-11 11:12:53,859 saving best model |
|
2023-10-11 11:12:56,387 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:13:05,348 epoch 4 - iter 27/272 - loss 0.17940497 - time (sec): 8.96 - samples/sec: 538.37 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-11 11:13:13,448 epoch 4 - iter 54/272 - loss 0.20547376 - time (sec): 17.06 - samples/sec: 507.08 - lr: 0.000121 - momentum: 0.000000 |
|
2023-10-11 11:13:23,088 epoch 4 - iter 81/272 - loss 0.21204444 - time (sec): 26.70 - samples/sec: 527.23 - lr: 0.000119 - momentum: 0.000000 |
|
2023-10-11 11:13:32,138 epoch 4 - iter 108/272 - loss 0.19861011 - time (sec): 35.75 - samples/sec: 531.39 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-11 11:13:41,269 epoch 4 - iter 135/272 - loss 0.19649019 - time (sec): 44.88 - samples/sec: 533.18 - lr: 0.000116 - momentum: 0.000000 |
|
2023-10-11 11:13:50,014 epoch 4 - iter 162/272 - loss 0.18956010 - time (sec): 53.62 - samples/sec: 527.21 - lr: 0.000114 - momentum: 0.000000 |
|
2023-10-11 11:14:00,798 epoch 4 - iter 189/272 - loss 0.18263628 - time (sec): 64.41 - samples/sec: 545.12 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-11 11:14:10,171 epoch 4 - iter 216/272 - loss 0.17817811 - time (sec): 73.78 - samples/sec: 546.98 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-11 11:14:19,885 epoch 4 - iter 243/272 - loss 0.17345252 - time (sec): 83.49 - samples/sec: 549.18 - lr: 0.000109 - momentum: 0.000000 |
|
2023-10-11 11:14:29,653 epoch 4 - iter 270/272 - loss 0.16873144 - time (sec): 93.26 - samples/sec: 553.83 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-11 11:14:30,202 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:14:30,202 EPOCH 4 done: loss 0.1683 - lr: 0.000107 |
|
2023-10-11 11:14:35,817 DEV : loss 0.16644711792469025 - f1-score (micro avg) 0.6248 |
|
2023-10-11 11:14:35,825 saving best model |
|
2023-10-11 11:14:38,358 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:14:48,002 epoch 5 - iter 27/272 - loss 0.10934003 - time (sec): 9.64 - samples/sec: 597.01 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-11 11:14:57,370 epoch 5 - iter 54/272 - loss 0.11810250 - time (sec): 19.01 - samples/sec: 584.87 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-11 11:15:06,752 epoch 5 - iter 81/272 - loss 0.11280886 - time (sec): 28.39 - samples/sec: 579.61 - lr: 0.000101 - momentum: 0.000000 |
|
2023-10-11 11:15:15,268 epoch 5 - iter 108/272 - loss 0.11577577 - time (sec): 36.91 - samples/sec: 564.71 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-11 11:15:23,880 epoch 5 - iter 135/272 - loss 0.11510951 - time (sec): 45.52 - samples/sec: 554.18 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-11 11:15:33,117 epoch 5 - iter 162/272 - loss 0.11718809 - time (sec): 54.76 - samples/sec: 553.04 - lr: 0.000096 - momentum: 0.000000 |
|
2023-10-11 11:15:42,474 epoch 5 - iter 189/272 - loss 0.11051106 - time (sec): 64.11 - samples/sec: 550.23 - lr: 0.000094 - momentum: 0.000000 |
|
2023-10-11 11:15:51,957 epoch 5 - iter 216/272 - loss 0.11192449 - time (sec): 73.59 - samples/sec: 550.60 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-11 11:16:02,208 epoch 5 - iter 243/272 - loss 0.11278962 - time (sec): 83.85 - samples/sec: 558.27 - lr: 0.000091 - momentum: 0.000000 |
|
2023-10-11 11:16:11,167 epoch 5 - iter 270/272 - loss 0.11164211 - time (sec): 92.80 - samples/sec: 558.74 - lr: 0.000089 - momentum: 0.000000 |
|
2023-10-11 11:16:11,545 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:16:11,545 EPOCH 5 done: loss 0.1115 - lr: 0.000089 |
|
2023-10-11 11:16:17,034 DEV : loss 0.14780402183532715 - f1-score (micro avg) 0.7273 |
|
2023-10-11 11:16:17,042 saving best model |
|
2023-10-11 11:16:19,527 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:16:28,224 epoch 6 - iter 27/272 - loss 0.06996347 - time (sec): 8.69 - samples/sec: 537.73 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-11 11:16:36,811 epoch 6 - iter 54/272 - loss 0.08153458 - time (sec): 17.28 - samples/sec: 534.03 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-11 11:16:45,704 epoch 6 - iter 81/272 - loss 0.09017221 - time (sec): 26.17 - samples/sec: 539.47 - lr: 0.000084 - momentum: 0.000000 |
|
2023-10-11 11:16:55,332 epoch 6 - iter 108/272 - loss 0.08370356 - time (sec): 35.80 - samples/sec: 551.38 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-11 11:17:04,982 epoch 6 - iter 135/272 - loss 0.07843134 - time (sec): 45.45 - samples/sec: 566.46 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-11 11:17:13,606 epoch 6 - iter 162/272 - loss 0.07866597 - time (sec): 54.07 - samples/sec: 558.77 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-11 11:17:23,040 epoch 6 - iter 189/272 - loss 0.07503492 - time (sec): 63.51 - samples/sec: 561.63 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-11 11:17:32,129 epoch 6 - iter 216/272 - loss 0.08015716 - time (sec): 72.60 - samples/sec: 558.57 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-11 11:17:41,521 epoch 6 - iter 243/272 - loss 0.07960023 - time (sec): 81.99 - samples/sec: 561.54 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-11 11:17:51,226 epoch 6 - iter 270/272 - loss 0.07884413 - time (sec): 91.69 - samples/sec: 561.73 - lr: 0.000071 - momentum: 0.000000 |
|
2023-10-11 11:17:51,887 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:17:51,887 EPOCH 6 done: loss 0.0788 - lr: 0.000071 |
|
2023-10-11 11:17:57,376 DEV : loss 0.14096224308013916 - f1-score (micro avg) 0.7518 |
|
2023-10-11 11:17:57,385 saving best model |
|
2023-10-11 11:17:59,894 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:18:09,294 epoch 7 - iter 27/272 - loss 0.06341984 - time (sec): 9.40 - samples/sec: 573.61 - lr: 0.000069 - momentum: 0.000000 |
|
2023-10-11 11:18:18,490 epoch 7 - iter 54/272 - loss 0.07180111 - time (sec): 18.59 - samples/sec: 549.22 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-11 11:18:28,484 epoch 7 - iter 81/272 - loss 0.06364756 - time (sec): 28.59 - samples/sec: 562.06 - lr: 0.000066 - momentum: 0.000000 |
|
2023-10-11 11:18:37,887 epoch 7 - iter 108/272 - loss 0.06134236 - time (sec): 37.99 - samples/sec: 562.23 - lr: 0.000064 - momentum: 0.000000 |
|
2023-10-11 11:18:47,377 epoch 7 - iter 135/272 - loss 0.06432419 - time (sec): 47.48 - samples/sec: 562.86 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-11 11:18:56,504 epoch 7 - iter 162/272 - loss 0.06152738 - time (sec): 56.61 - samples/sec: 558.29 - lr: 0.000061 - momentum: 0.000000 |
|
2023-10-11 11:19:06,158 epoch 7 - iter 189/272 - loss 0.06438334 - time (sec): 66.26 - samples/sec: 557.18 - lr: 0.000059 - momentum: 0.000000 |
|
2023-10-11 11:19:14,940 epoch 7 - iter 216/272 - loss 0.06377457 - time (sec): 75.04 - samples/sec: 552.50 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-11 11:19:24,451 epoch 7 - iter 243/272 - loss 0.06205376 - time (sec): 84.55 - samples/sec: 554.17 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-11 11:19:33,714 epoch 7 - iter 270/272 - loss 0.05922229 - time (sec): 93.82 - samples/sec: 552.13 - lr: 0.000054 - momentum: 0.000000 |
|
2023-10-11 11:19:34,122 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:19:34,122 EPOCH 7 done: loss 0.0591 - lr: 0.000054 |
|
2023-10-11 11:19:39,872 DEV : loss 0.15206335484981537 - f1-score (micro avg) 0.7609 |
|
2023-10-11 11:19:39,881 saving best model |
|
2023-10-11 11:19:42,622 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:19:51,678 epoch 8 - iter 27/272 - loss 0.05148392 - time (sec): 9.05 - samples/sec: 571.50 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-11 11:20:00,288 epoch 8 - iter 54/272 - loss 0.04823998 - time (sec): 17.66 - samples/sec: 554.33 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-11 11:20:10,210 epoch 8 - iter 81/272 - loss 0.05011770 - time (sec): 27.58 - samples/sec: 558.63 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-11 11:20:19,535 epoch 8 - iter 108/272 - loss 0.05553018 - time (sec): 36.91 - samples/sec: 550.68 - lr: 0.000046 - momentum: 0.000000 |
|
2023-10-11 11:20:29,307 epoch 8 - iter 135/272 - loss 0.05185215 - time (sec): 46.68 - samples/sec: 547.66 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-11 11:20:39,021 epoch 8 - iter 162/272 - loss 0.05106686 - time (sec): 56.39 - samples/sec: 545.69 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-11 11:20:48,754 epoch 8 - iter 189/272 - loss 0.04852618 - time (sec): 66.13 - samples/sec: 546.70 - lr: 0.000041 - momentum: 0.000000 |
|
2023-10-11 11:20:58,466 epoch 8 - iter 216/272 - loss 0.04781162 - time (sec): 75.84 - samples/sec: 548.61 - lr: 0.000039 - momentum: 0.000000 |
|
2023-10-11 11:21:08,324 epoch 8 - iter 243/272 - loss 0.04540073 - time (sec): 85.70 - samples/sec: 549.88 - lr: 0.000038 - momentum: 0.000000 |
|
2023-10-11 11:21:17,256 epoch 8 - iter 270/272 - loss 0.04536122 - time (sec): 94.63 - samples/sec: 545.93 - lr: 0.000036 - momentum: 0.000000 |
|
2023-10-11 11:21:17,809 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:21:17,809 EPOCH 8 done: loss 0.0452 - lr: 0.000036 |
|
2023-10-11 11:21:23,595 DEV : loss 0.1474316567182541 - f1-score (micro avg) 0.7776 |
|
2023-10-11 11:21:23,604 saving best model |
|
2023-10-11 11:21:26,122 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:21:36,109 epoch 9 - iter 27/272 - loss 0.03768928 - time (sec): 9.98 - samples/sec: 584.31 - lr: 0.000034 - momentum: 0.000000 |
|
2023-10-11 11:21:45,675 epoch 9 - iter 54/272 - loss 0.03902434 - time (sec): 19.55 - samples/sec: 569.27 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-11 11:21:54,404 epoch 9 - iter 81/272 - loss 0.03949006 - time (sec): 28.28 - samples/sec: 549.31 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-11 11:22:03,982 epoch 9 - iter 108/272 - loss 0.04115931 - time (sec): 37.86 - samples/sec: 544.01 - lr: 0.000029 - momentum: 0.000000 |
|
2023-10-11 11:22:13,725 epoch 9 - iter 135/272 - loss 0.04113215 - time (sec): 47.60 - samples/sec: 546.99 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-11 11:22:23,425 epoch 9 - iter 162/272 - loss 0.04040816 - time (sec): 57.30 - samples/sec: 547.83 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-11 11:22:32,701 epoch 9 - iter 189/272 - loss 0.03962695 - time (sec): 66.57 - samples/sec: 545.19 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-11 11:22:41,925 epoch 9 - iter 216/272 - loss 0.03930060 - time (sec): 75.80 - samples/sec: 542.07 - lr: 0.000022 - momentum: 0.000000 |
|
2023-10-11 11:22:52,271 epoch 9 - iter 243/272 - loss 0.03855005 - time (sec): 86.14 - samples/sec: 545.98 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-11 11:23:01,596 epoch 9 - iter 270/272 - loss 0.03738276 - time (sec): 95.47 - samples/sec: 543.30 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-11 11:23:01,958 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:23:01,958 EPOCH 9 done: loss 0.0374 - lr: 0.000018 |
|
2023-10-11 11:23:07,784 DEV : loss 0.15458551049232483 - f1-score (micro avg) 0.7717 |
|
2023-10-11 11:23:07,793 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:23:17,507 epoch 10 - iter 27/272 - loss 0.03642691 - time (sec): 9.71 - samples/sec: 560.74 - lr: 0.000016 - momentum: 0.000000 |
|
2023-10-11 11:23:26,192 epoch 10 - iter 54/272 - loss 0.03760888 - time (sec): 18.40 - samples/sec: 530.22 - lr: 0.000014 - momentum: 0.000000 |
|
2023-10-11 11:23:36,876 epoch 10 - iter 81/272 - loss 0.03781097 - time (sec): 29.08 - samples/sec: 562.52 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-11 11:23:47,197 epoch 10 - iter 108/272 - loss 0.03981923 - time (sec): 39.40 - samples/sec: 569.48 - lr: 0.000011 - momentum: 0.000000 |
|
2023-10-11 11:23:56,858 epoch 10 - iter 135/272 - loss 0.03893238 - time (sec): 49.06 - samples/sec: 566.00 - lr: 0.000009 - momentum: 0.000000 |
|
2023-10-11 11:24:05,392 epoch 10 - iter 162/272 - loss 0.03669560 - time (sec): 57.60 - samples/sec: 553.96 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-11 11:24:15,658 epoch 10 - iter 189/272 - loss 0.03548350 - time (sec): 67.86 - samples/sec: 551.09 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-11 11:24:24,996 epoch 10 - iter 216/272 - loss 0.03383848 - time (sec): 77.20 - samples/sec: 543.09 - lr: 0.000004 - momentum: 0.000000 |
|
2023-10-11 11:24:34,515 epoch 10 - iter 243/272 - loss 0.03361531 - time (sec): 86.72 - samples/sec: 542.13 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-11 11:24:43,874 epoch 10 - iter 270/272 - loss 0.03227112 - time (sec): 96.08 - samples/sec: 538.59 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-11 11:24:44,326 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:24:44,326 EPOCH 10 done: loss 0.0323 - lr: 0.000000 |
|
2023-10-11 11:24:50,079 DEV : loss 0.15587206184864044 - f1-score (micro avg) 0.7731 |
|
2023-10-11 11:24:50,931 ---------------------------------------------------------------------------------------------------- |
|
2023-10-11 11:24:50,933 Loading model from best epoch ... |
|
2023-10-11 11:24:54,558 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG |
|
2023-10-11 11:25:06,281 |
|
Results: |
|
- F-score (micro) 0.7481 |
|
- F-score (macro) 0.6929 |
|
- Accuracy 0.6177 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
LOC 0.7357 0.8654 0.7953 312 |
|
PER 0.6434 0.8846 0.7449 208 |
|
ORG 0.4615 0.4364 0.4486 55 |
|
HumanProd 0.7500 0.8182 0.7826 22 |
|
|
|
micro avg 0.6804 0.8308 0.7481 597 |
|
macro avg 0.6476 0.7511 0.6929 597 |
|
weighted avg 0.6788 0.8308 0.7453 597 |
|
|
|
2023-10-11 11:25:06,282 ---------------------------------------------------------------------------------------------------- |
|
|