2023-10-13 13:41:03,373 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:41:03,375 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 13:41:03,375 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:41:03,375 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr 2023-10-13 13:41:03,375 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:41:03,375 Train: 7936 sentences 2023-10-13 13:41:03,375 (train_with_dev=False, train_with_test=False) 2023-10-13 13:41:03,376 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:41:03,376 Training Params: 2023-10-13 13:41:03,376 - learning_rate: "0.00016" 2023-10-13 13:41:03,376 - mini_batch_size: "4" 2023-10-13 13:41:03,376 - max_epochs: "10" 2023-10-13 13:41:03,376 - shuffle: "True" 2023-10-13 13:41:03,376 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:41:03,376 Plugins: 2023-10-13 13:41:03,376 - TensorboardLogger 2023-10-13 13:41:03,376 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 13:41:03,376 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:41:03,376 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 13:41:03,376 - metric: "('micro avg', 'f1-score')" 2023-10-13 13:41:03,376 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:41:03,376 Computation: 2023-10-13 13:41:03,377 - compute on device: cuda:0 2023-10-13 13:41:03,377 - embedding storage: none 2023-10-13 13:41:03,377 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:41:03,377 Model training base path: "hmbench-icdar/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-5" 2023-10-13 13:41:03,377 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:41:03,377 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:41:03,377 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-13 13:41:57,697 epoch 1 - iter 198/1984 - loss 2.53411240 - time (sec): 54.32 - samples/sec: 325.47 - lr: 0.000016 - momentum: 0.000000 2023-10-13 13:42:51,087 epoch 1 - iter 396/1984 - loss 2.34200986 - time (sec): 107.71 - samples/sec: 309.82 - lr: 0.000032 - momentum: 0.000000 2023-10-13 13:43:46,734 epoch 1 - iter 594/1984 - loss 2.00986386 - time (sec): 163.35 - samples/sec: 309.73 - lr: 0.000048 - momentum: 0.000000 2023-10-13 13:44:41,691 epoch 1 - iter 792/1984 - loss 1.71728426 - time (sec): 218.31 - samples/sec: 300.53 - lr: 0.000064 - momentum: 0.000000 2023-10-13 13:45:39,863 epoch 1 - iter 990/1984 - loss 1.47309135 - time (sec): 276.48 - samples/sec: 295.09 - lr: 0.000080 - momentum: 0.000000 2023-10-13 13:46:37,854 epoch 1 - iter 1188/1984 - loss 1.28054026 - time (sec): 334.48 - samples/sec: 291.62 - lr: 0.000096 - momentum: 0.000000 2023-10-13 13:47:32,695 epoch 1 - iter 1386/1984 - loss 1.13053154 - time (sec): 389.32 - samples/sec: 293.66 - lr: 0.000112 - momentum: 0.000000 2023-10-13 13:48:27,496 epoch 1 - iter 1584/1984 - loss 1.01770682 - time (sec): 444.12 - samples/sec: 293.40 - lr: 0.000128 - momentum: 0.000000 2023-10-13 13:49:25,748 epoch 1 - iter 1782/1984 - loss 0.91618266 - time (sec): 502.37 - samples/sec: 294.61 - lr: 0.000144 - momentum: 0.000000 2023-10-13 13:50:22,623 epoch 1 - iter 1980/1984 - loss 0.84428444 - time (sec): 559.24 - samples/sec: 292.81 - lr: 0.000160 - momentum: 0.000000 2023-10-13 13:50:23,697 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:50:23,697 EPOCH 1 done: loss 0.8433 - lr: 0.000160 2023-10-13 13:50:48,714 DEV : loss 0.13220053911209106 - f1-score (micro avg) 0.6771 2023-10-13 13:50:48,754 saving best model 2023-10-13 13:50:49,635 ---------------------------------------------------------------------------------------------------- 2023-10-13 13:51:44,743 epoch 2 - iter 198/1984 - loss 0.15716909 - time (sec): 55.11 - samples/sec: 300.33 - lr: 0.000158 - momentum: 0.000000 2023-10-13 13:52:39,667 epoch 2 - iter 396/1984 - loss 0.14341696 - time (sec): 110.03 - samples/sec: 302.03 - lr: 0.000156 - momentum: 0.000000 2023-10-13 13:53:39,458 epoch 2 - iter 594/1984 - loss 0.13622103 - time (sec): 169.82 - samples/sec: 295.61 - lr: 0.000155 - momentum: 0.000000 2023-10-13 13:54:35,548 epoch 2 - iter 792/1984 - loss 0.13486939 - time (sec): 225.91 - samples/sec: 291.39 - lr: 0.000153 - momentum: 0.000000 2023-10-13 13:55:31,683 epoch 2 - iter 990/1984 - loss 0.13032807 - time (sec): 282.05 - samples/sec: 291.80 - lr: 0.000151 - momentum: 0.000000 2023-10-13 13:56:24,784 epoch 2 - iter 1188/1984 - loss 0.12849675 - time (sec): 335.15 - samples/sec: 294.12 - lr: 0.000149 - momentum: 0.000000 2023-10-13 13:57:19,699 epoch 2 - iter 1386/1984 - loss 0.12609233 - time (sec): 390.06 - samples/sec: 294.46 - lr: 0.000148 - momentum: 0.000000 2023-10-13 13:58:16,450 epoch 2 - iter 1584/1984 - loss 0.12288083 - time (sec): 446.81 - samples/sec: 292.77 - lr: 0.000146 - momentum: 0.000000 2023-10-13 13:59:10,814 epoch 2 - iter 1782/1984 - loss 0.12152366 - time (sec): 501.18 - samples/sec: 291.77 - lr: 0.000144 - momentum: 0.000000 2023-10-13 14:00:08,264 epoch 2 - iter 1980/1984 - loss 0.11885738 - time (sec): 558.63 - samples/sec: 293.08 - lr: 0.000142 - momentum: 0.000000 2023-10-13 14:00:09,548 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:00:09,549 EPOCH 2 done: loss 0.1188 - lr: 0.000142 2023-10-13 14:00:35,360 DEV : loss 0.09111367911100388 - f1-score (micro avg) 0.7334 2023-10-13 14:00:35,406 saving best model 2023-10-13 14:00:37,985 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:01:35,507 epoch 3 - iter 198/1984 - loss 0.06992910 - time (sec): 57.52 - samples/sec: 279.98 - lr: 0.000140 - momentum: 0.000000 2023-10-13 14:02:30,138 epoch 3 - iter 396/1984 - loss 0.07739811 - time (sec): 112.15 - samples/sec: 289.02 - lr: 0.000139 - momentum: 0.000000 2023-10-13 14:03:23,678 epoch 3 - iter 594/1984 - loss 0.07861216 - time (sec): 165.69 - samples/sec: 292.77 - lr: 0.000137 - momentum: 0.000000 2023-10-13 14:04:18,698 epoch 3 - iter 792/1984 - loss 0.07894264 - time (sec): 220.71 - samples/sec: 294.08 - lr: 0.000135 - momentum: 0.000000 2023-10-13 14:05:11,588 epoch 3 - iter 990/1984 - loss 0.07810640 - time (sec): 273.60 - samples/sec: 295.90 - lr: 0.000133 - momentum: 0.000000 2023-10-13 14:06:07,163 epoch 3 - iter 1188/1984 - loss 0.07738219 - time (sec): 329.17 - samples/sec: 296.93 - lr: 0.000132 - momentum: 0.000000 2023-10-13 14:07:04,929 epoch 3 - iter 1386/1984 - loss 0.07780412 - time (sec): 386.94 - samples/sec: 295.25 - lr: 0.000130 - momentum: 0.000000 2023-10-13 14:08:00,504 epoch 3 - iter 1584/1984 - loss 0.07607150 - time (sec): 442.51 - samples/sec: 295.09 - lr: 0.000128 - momentum: 0.000000 2023-10-13 14:08:57,596 epoch 3 - iter 1782/1984 - loss 0.07488198 - time (sec): 499.61 - samples/sec: 294.57 - lr: 0.000126 - momentum: 0.000000 2023-10-13 14:09:56,502 epoch 3 - iter 1980/1984 - loss 0.07537555 - time (sec): 558.51 - samples/sec: 293.01 - lr: 0.000125 - momentum: 0.000000 2023-10-13 14:09:57,672 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:09:57,673 EPOCH 3 done: loss 0.0753 - lr: 0.000125 2023-10-13 14:10:24,639 DEV : loss 0.09566155821084976 - f1-score (micro avg) 0.7588 2023-10-13 14:10:24,680 saving best model 2023-10-13 14:10:27,792 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:11:22,947 epoch 4 - iter 198/1984 - loss 0.05573856 - time (sec): 55.15 - samples/sec: 298.81 - lr: 0.000123 - momentum: 0.000000 2023-10-13 14:12:18,581 epoch 4 - iter 396/1984 - loss 0.05399853 - time (sec): 110.78 - samples/sec: 295.09 - lr: 0.000121 - momentum: 0.000000 2023-10-13 14:13:12,197 epoch 4 - iter 594/1984 - loss 0.05122298 - time (sec): 164.40 - samples/sec: 297.54 - lr: 0.000119 - momentum: 0.000000 2023-10-13 14:14:05,824 epoch 4 - iter 792/1984 - loss 0.05330777 - time (sec): 218.03 - samples/sec: 299.43 - lr: 0.000117 - momentum: 0.000000 2023-10-13 14:14:59,557 epoch 4 - iter 990/1984 - loss 0.05347162 - time (sec): 271.76 - samples/sec: 303.22 - lr: 0.000116 - momentum: 0.000000 2023-10-13 14:15:53,694 epoch 4 - iter 1188/1984 - loss 0.05216234 - time (sec): 325.90 - samples/sec: 304.32 - lr: 0.000114 - momentum: 0.000000 2023-10-13 14:16:49,035 epoch 4 - iter 1386/1984 - loss 0.05143628 - time (sec): 381.24 - samples/sec: 301.49 - lr: 0.000112 - momentum: 0.000000 2023-10-13 14:17:43,745 epoch 4 - iter 1584/1984 - loss 0.05184746 - time (sec): 435.95 - samples/sec: 300.80 - lr: 0.000110 - momentum: 0.000000 2023-10-13 14:18:38,926 epoch 4 - iter 1782/1984 - loss 0.05298005 - time (sec): 491.13 - samples/sec: 300.19 - lr: 0.000109 - momentum: 0.000000 2023-10-13 14:19:32,075 epoch 4 - iter 1980/1984 - loss 0.05388844 - time (sec): 544.28 - samples/sec: 300.75 - lr: 0.000107 - momentum: 0.000000 2023-10-13 14:19:33,222 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:19:33,223 EPOCH 4 done: loss 0.0538 - lr: 0.000107 2023-10-13 14:20:00,317 DEV : loss 0.12870270013809204 - f1-score (micro avg) 0.7573 2023-10-13 14:20:00,358 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:20:53,615 epoch 5 - iter 198/1984 - loss 0.03435675 - time (sec): 53.25 - samples/sec: 315.28 - lr: 0.000105 - momentum: 0.000000 2023-10-13 14:21:44,798 epoch 5 - iter 396/1984 - loss 0.03418837 - time (sec): 104.44 - samples/sec: 319.18 - lr: 0.000103 - momentum: 0.000000 2023-10-13 14:22:37,689 epoch 5 - iter 594/1984 - loss 0.03735641 - time (sec): 157.33 - samples/sec: 318.10 - lr: 0.000101 - momentum: 0.000000 2023-10-13 14:23:31,434 epoch 5 - iter 792/1984 - loss 0.03848171 - time (sec): 211.07 - samples/sec: 314.78 - lr: 0.000100 - momentum: 0.000000 2023-10-13 14:24:27,121 epoch 5 - iter 990/1984 - loss 0.03772221 - time (sec): 266.76 - samples/sec: 309.25 - lr: 0.000098 - momentum: 0.000000 2023-10-13 14:25:20,412 epoch 5 - iter 1188/1984 - loss 0.03984799 - time (sec): 320.05 - samples/sec: 307.07 - lr: 0.000096 - momentum: 0.000000 2023-10-13 14:26:15,156 epoch 5 - iter 1386/1984 - loss 0.03966161 - time (sec): 374.80 - samples/sec: 304.75 - lr: 0.000094 - momentum: 0.000000 2023-10-13 14:27:08,199 epoch 5 - iter 1584/1984 - loss 0.03987275 - time (sec): 427.84 - samples/sec: 306.78 - lr: 0.000093 - momentum: 0.000000 2023-10-13 14:28:03,847 epoch 5 - iter 1782/1984 - loss 0.04040681 - time (sec): 483.49 - samples/sec: 305.38 - lr: 0.000091 - momentum: 0.000000 2023-10-13 14:28:57,477 epoch 5 - iter 1980/1984 - loss 0.03995813 - time (sec): 537.12 - samples/sec: 304.92 - lr: 0.000089 - momentum: 0.000000 2023-10-13 14:28:58,562 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:28:58,563 EPOCH 5 done: loss 0.0399 - lr: 0.000089 2023-10-13 14:29:25,637 DEV : loss 0.15645428001880646 - f1-score (micro avg) 0.7602 2023-10-13 14:29:25,679 saving best model 2023-10-13 14:29:28,321 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:30:21,475 epoch 6 - iter 198/1984 - loss 0.02467542 - time (sec): 53.15 - samples/sec: 321.45 - lr: 0.000087 - momentum: 0.000000 2023-10-13 14:31:13,093 epoch 6 - iter 396/1984 - loss 0.02376319 - time (sec): 104.77 - samples/sec: 320.03 - lr: 0.000085 - momentum: 0.000000 2023-10-13 14:32:04,586 epoch 6 - iter 594/1984 - loss 0.02474508 - time (sec): 156.26 - samples/sec: 315.70 - lr: 0.000084 - momentum: 0.000000 2023-10-13 14:32:57,169 epoch 6 - iter 792/1984 - loss 0.02868125 - time (sec): 208.84 - samples/sec: 315.69 - lr: 0.000082 - momentum: 0.000000 2023-10-13 14:33:51,982 epoch 6 - iter 990/1984 - loss 0.02813004 - time (sec): 263.66 - samples/sec: 311.99 - lr: 0.000080 - momentum: 0.000000 2023-10-13 14:34:49,635 epoch 6 - iter 1188/1984 - loss 0.02767063 - time (sec): 321.31 - samples/sec: 306.76 - lr: 0.000078 - momentum: 0.000000 2023-10-13 14:35:44,096 epoch 6 - iter 1386/1984 - loss 0.02786452 - time (sec): 375.77 - samples/sec: 305.99 - lr: 0.000077 - momentum: 0.000000 2023-10-13 14:36:36,019 epoch 6 - iter 1584/1984 - loss 0.02885341 - time (sec): 427.69 - samples/sec: 306.12 - lr: 0.000075 - momentum: 0.000000 2023-10-13 14:37:29,223 epoch 6 - iter 1782/1984 - loss 0.02817717 - time (sec): 480.90 - samples/sec: 306.03 - lr: 0.000073 - momentum: 0.000000 2023-10-13 14:38:27,626 epoch 6 - iter 1980/1984 - loss 0.02939511 - time (sec): 539.30 - samples/sec: 303.56 - lr: 0.000071 - momentum: 0.000000 2023-10-13 14:38:28,789 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:38:28,789 EPOCH 6 done: loss 0.0295 - lr: 0.000071 2023-10-13 14:38:56,206 DEV : loss 0.16446241736412048 - f1-score (micro avg) 0.7492 2023-10-13 14:38:56,257 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:39:51,637 epoch 7 - iter 198/1984 - loss 0.01809156 - time (sec): 55.38 - samples/sec: 297.72 - lr: 0.000069 - momentum: 0.000000 2023-10-13 14:40:46,224 epoch 7 - iter 396/1984 - loss 0.01613317 - time (sec): 109.96 - samples/sec: 293.23 - lr: 0.000068 - momentum: 0.000000 2023-10-13 14:41:41,377 epoch 7 - iter 594/1984 - loss 0.01702457 - time (sec): 165.12 - samples/sec: 297.91 - lr: 0.000066 - momentum: 0.000000 2023-10-13 14:42:36,232 epoch 7 - iter 792/1984 - loss 0.01784643 - time (sec): 219.97 - samples/sec: 296.04 - lr: 0.000064 - momentum: 0.000000 2023-10-13 14:43:31,504 epoch 7 - iter 990/1984 - loss 0.01766601 - time (sec): 275.24 - samples/sec: 296.10 - lr: 0.000062 - momentum: 0.000000 2023-10-13 14:44:29,053 epoch 7 - iter 1188/1984 - loss 0.01798869 - time (sec): 332.79 - samples/sec: 293.84 - lr: 0.000061 - momentum: 0.000000 2023-10-13 14:45:23,368 epoch 7 - iter 1386/1984 - loss 0.01843451 - time (sec): 387.11 - samples/sec: 294.70 - lr: 0.000059 - momentum: 0.000000 2023-10-13 14:46:15,280 epoch 7 - iter 1584/1984 - loss 0.01841142 - time (sec): 439.02 - samples/sec: 296.13 - lr: 0.000057 - momentum: 0.000000 2023-10-13 14:47:10,847 epoch 7 - iter 1782/1984 - loss 0.01957244 - time (sec): 494.59 - samples/sec: 297.35 - lr: 0.000055 - momentum: 0.000000 2023-10-13 14:48:05,500 epoch 7 - iter 1980/1984 - loss 0.02031150 - time (sec): 549.24 - samples/sec: 298.17 - lr: 0.000053 - momentum: 0.000000 2023-10-13 14:48:06,548 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:48:06,548 EPOCH 7 done: loss 0.0203 - lr: 0.000053 2023-10-13 14:48:34,612 DEV : loss 0.19750244915485382 - f1-score (micro avg) 0.7567 2023-10-13 14:48:34,664 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:49:28,793 epoch 8 - iter 198/1984 - loss 0.01418068 - time (sec): 54.13 - samples/sec: 304.44 - lr: 0.000052 - momentum: 0.000000 2023-10-13 14:50:22,561 epoch 8 - iter 396/1984 - loss 0.01608296 - time (sec): 107.90 - samples/sec: 307.64 - lr: 0.000050 - momentum: 0.000000 2023-10-13 14:51:15,361 epoch 8 - iter 594/1984 - loss 0.01364728 - time (sec): 160.69 - samples/sec: 306.85 - lr: 0.000048 - momentum: 0.000000 2023-10-13 14:52:08,131 epoch 8 - iter 792/1984 - loss 0.01376922 - time (sec): 213.46 - samples/sec: 309.15 - lr: 0.000046 - momentum: 0.000000 2023-10-13 14:53:02,352 epoch 8 - iter 990/1984 - loss 0.01311662 - time (sec): 267.69 - samples/sec: 307.25 - lr: 0.000045 - momentum: 0.000000 2023-10-13 14:53:57,556 epoch 8 - iter 1188/1984 - loss 0.01375357 - time (sec): 322.89 - samples/sec: 305.33 - lr: 0.000043 - momentum: 0.000000 2023-10-13 14:54:52,392 epoch 8 - iter 1386/1984 - loss 0.01337745 - time (sec): 377.73 - samples/sec: 304.21 - lr: 0.000041 - momentum: 0.000000 2023-10-13 14:55:49,698 epoch 8 - iter 1584/1984 - loss 0.01404364 - time (sec): 435.03 - samples/sec: 300.09 - lr: 0.000039 - momentum: 0.000000 2023-10-13 14:56:44,413 epoch 8 - iter 1782/1984 - loss 0.01463217 - time (sec): 489.75 - samples/sec: 300.14 - lr: 0.000037 - momentum: 0.000000 2023-10-13 14:57:39,656 epoch 8 - iter 1980/1984 - loss 0.01450393 - time (sec): 544.99 - samples/sec: 300.46 - lr: 0.000036 - momentum: 0.000000 2023-10-13 14:57:40,732 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:57:40,733 EPOCH 8 done: loss 0.0145 - lr: 0.000036 2023-10-13 14:58:09,479 DEV : loss 0.21577665209770203 - f1-score (micro avg) 0.7529 2023-10-13 14:58:09,519 ---------------------------------------------------------------------------------------------------- 2023-10-13 14:59:03,230 epoch 9 - iter 198/1984 - loss 0.00877228 - time (sec): 53.71 - samples/sec: 293.64 - lr: 0.000034 - momentum: 0.000000 2023-10-13 15:00:00,025 epoch 9 - iter 396/1984 - loss 0.00937105 - time (sec): 110.50 - samples/sec: 287.85 - lr: 0.000032 - momentum: 0.000000 2023-10-13 15:00:54,419 epoch 9 - iter 594/1984 - loss 0.00856868 - time (sec): 164.90 - samples/sec: 293.52 - lr: 0.000030 - momentum: 0.000000 2023-10-13 15:01:49,560 epoch 9 - iter 792/1984 - loss 0.00989234 - time (sec): 220.04 - samples/sec: 295.67 - lr: 0.000029 - momentum: 0.000000 2023-10-13 15:02:47,168 epoch 9 - iter 990/1984 - loss 0.01106797 - time (sec): 277.65 - samples/sec: 292.62 - lr: 0.000027 - momentum: 0.000000 2023-10-13 15:03:41,878 epoch 9 - iter 1188/1984 - loss 0.01081355 - time (sec): 332.36 - samples/sec: 288.46 - lr: 0.000025 - momentum: 0.000000 2023-10-13 15:04:36,367 epoch 9 - iter 1386/1984 - loss 0.01050288 - time (sec): 386.85 - samples/sec: 292.92 - lr: 0.000023 - momentum: 0.000000 2023-10-13 15:05:30,076 epoch 9 - iter 1584/1984 - loss 0.01056348 - time (sec): 440.55 - samples/sec: 294.63 - lr: 0.000021 - momentum: 0.000000 2023-10-13 15:06:27,189 epoch 9 - iter 1782/1984 - loss 0.01102912 - time (sec): 497.67 - samples/sec: 295.61 - lr: 0.000020 - momentum: 0.000000 2023-10-13 15:07:18,598 epoch 9 - iter 1980/1984 - loss 0.01087141 - time (sec): 549.08 - samples/sec: 297.93 - lr: 0.000018 - momentum: 0.000000 2023-10-13 15:07:19,699 ---------------------------------------------------------------------------------------------------- 2023-10-13 15:07:19,699 EPOCH 9 done: loss 0.0108 - lr: 0.000018 2023-10-13 15:07:45,626 DEV : loss 0.22953416407108307 - f1-score (micro avg) 0.7647 2023-10-13 15:07:45,671 saving best model 2023-10-13 15:07:48,309 ---------------------------------------------------------------------------------------------------- 2023-10-13 15:08:44,790 epoch 10 - iter 198/1984 - loss 0.00464760 - time (sec): 56.48 - samples/sec: 299.72 - lr: 0.000016 - momentum: 0.000000 2023-10-13 15:09:43,385 epoch 10 - iter 396/1984 - loss 0.00822453 - time (sec): 115.07 - samples/sec: 283.39 - lr: 0.000014 - momentum: 0.000000 2023-10-13 15:10:37,532 epoch 10 - iter 594/1984 - loss 0.00828961 - time (sec): 169.22 - samples/sec: 286.73 - lr: 0.000013 - momentum: 0.000000 2023-10-13 15:11:31,359 epoch 10 - iter 792/1984 - loss 0.00709603 - time (sec): 223.04 - samples/sec: 288.73 - lr: 0.000011 - momentum: 0.000000 2023-10-13 15:12:28,943 epoch 10 - iter 990/1984 - loss 0.00651970 - time (sec): 280.63 - samples/sec: 289.30 - lr: 0.000009 - momentum: 0.000000 2023-10-13 15:13:24,258 epoch 10 - iter 1188/1984 - loss 0.00667521 - time (sec): 335.94 - samples/sec: 291.68 - lr: 0.000007 - momentum: 0.000000 2023-10-13 15:14:23,470 epoch 10 - iter 1386/1984 - loss 0.00650330 - time (sec): 395.16 - samples/sec: 290.20 - lr: 0.000005 - momentum: 0.000000 2023-10-13 15:15:17,651 epoch 10 - iter 1584/1984 - loss 0.00684118 - time (sec): 449.34 - samples/sec: 292.52 - lr: 0.000004 - momentum: 0.000000 2023-10-13 15:16:11,772 epoch 10 - iter 1782/1984 - loss 0.00684599 - time (sec): 503.46 - samples/sec: 293.68 - lr: 0.000002 - momentum: 0.000000 2023-10-13 15:17:08,238 epoch 10 - iter 1980/1984 - loss 0.00707841 - time (sec): 559.92 - samples/sec: 292.19 - lr: 0.000000 - momentum: 0.000000 2023-10-13 15:17:09,533 ---------------------------------------------------------------------------------------------------- 2023-10-13 15:17:09,534 EPOCH 10 done: loss 0.0071 - lr: 0.000000 2023-10-13 15:17:37,000 DEV : loss 0.23275645077228546 - f1-score (micro avg) 0.76 2023-10-13 15:17:38,020 ---------------------------------------------------------------------------------------------------- 2023-10-13 15:17:38,023 Loading model from best epoch ... 2023-10-13 15:17:42,458 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-13 15:18:07,926 Results: - F-score (micro) 0.7594 - F-score (macro) 0.6623 - Accuracy 0.6372 By class: precision recall f1-score support LOC 0.8053 0.8397 0.8221 655 PER 0.7076 0.7489 0.7277 223 ORG 0.5341 0.3701 0.4372 127 micro avg 0.7587 0.7602 0.7594 1005 macro avg 0.6823 0.6529 0.6623 1005 weighted avg 0.7493 0.7602 0.7525 1005 2023-10-13 15:18:07,927 ----------------------------------------------------------------------------------------------------