2023-10-16 18:06:43,262 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:06:43,263 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(32001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-16 18:06:43,264 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:06:43,264 MultiCorpus: 1166 train + 165 dev + 415 test sentences - NER_HIPE_2022 Corpus: 1166 train + 165 dev + 415 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/fi/with_doc_seperator 2023-10-16 18:06:43,264 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:06:43,264 Train: 1166 sentences 2023-10-16 18:06:43,264 (train_with_dev=False, train_with_test=False) 2023-10-16 18:06:43,264 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:06:43,264 Training Params: 2023-10-16 18:06:43,264 - learning_rate: "3e-05" 2023-10-16 18:06:43,264 - mini_batch_size: "4" 2023-10-16 18:06:43,264 - max_epochs: "10" 2023-10-16 18:06:43,264 - shuffle: "True" 2023-10-16 18:06:43,264 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:06:43,264 Plugins: 2023-10-16 18:06:43,264 - LinearScheduler | warmup_fraction: '0.1' 2023-10-16 18:06:43,264 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:06:43,264 Final evaluation on model from best epoch (best-model.pt) 2023-10-16 18:06:43,264 - metric: "('micro avg', 'f1-score')" 2023-10-16 18:06:43,264 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:06:43,264 Computation: 2023-10-16 18:06:43,264 - compute on device: cuda:0 2023-10-16 18:06:43,264 - embedding storage: none 2023-10-16 18:06:43,264 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:06:43,264 Model training base path: "hmbench-newseye/fi-dbmdz/bert-base-historic-multilingual-cased-bs4-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-2" 2023-10-16 18:06:43,264 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:06:43,264 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:06:45,038 epoch 1 - iter 29/292 - loss 2.88222167 - time (sec): 1.77 - samples/sec: 2934.55 - lr: 0.000003 - momentum: 0.000000 2023-10-16 18:06:46,609 epoch 1 - iter 58/292 - loss 2.60446904 - time (sec): 3.34 - samples/sec: 2697.91 - lr: 0.000006 - momentum: 0.000000 2023-10-16 18:06:48,242 epoch 1 - iter 87/292 - loss 2.04069918 - time (sec): 4.98 - samples/sec: 2655.47 - lr: 0.000009 - momentum: 0.000000 2023-10-16 18:06:49,732 epoch 1 - iter 116/292 - loss 1.70752429 - time (sec): 6.47 - samples/sec: 2663.44 - lr: 0.000012 - momentum: 0.000000 2023-10-16 18:06:51,291 epoch 1 - iter 145/292 - loss 1.48725658 - time (sec): 8.03 - samples/sec: 2632.59 - lr: 0.000015 - momentum: 0.000000 2023-10-16 18:06:52,955 epoch 1 - iter 174/292 - loss 1.33904630 - time (sec): 9.69 - samples/sec: 2602.10 - lr: 0.000018 - momentum: 0.000000 2023-10-16 18:06:54,672 epoch 1 - iter 203/292 - loss 1.16634244 - time (sec): 11.41 - samples/sec: 2674.25 - lr: 0.000021 - momentum: 0.000000 2023-10-16 18:06:56,311 epoch 1 - iter 232/292 - loss 1.06978910 - time (sec): 13.05 - samples/sec: 2671.11 - lr: 0.000024 - momentum: 0.000000 2023-10-16 18:06:58,151 epoch 1 - iter 261/292 - loss 0.99870433 - time (sec): 14.89 - samples/sec: 2689.92 - lr: 0.000027 - momentum: 0.000000 2023-10-16 18:06:59,742 epoch 1 - iter 290/292 - loss 0.93223498 - time (sec): 16.48 - samples/sec: 2676.79 - lr: 0.000030 - momentum: 0.000000 2023-10-16 18:06:59,856 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:06:59,857 EPOCH 1 done: loss 0.9270 - lr: 0.000030 2023-10-16 18:07:01,225 DEV : loss 0.20976495742797852 - f1-score (micro avg) 0.3889 2023-10-16 18:07:01,232 saving best model 2023-10-16 18:07:01,752 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:07:03,453 epoch 2 - iter 29/292 - loss 0.25107587 - time (sec): 1.70 - samples/sec: 2491.07 - lr: 0.000030 - momentum: 0.000000 2023-10-16 18:07:05,086 epoch 2 - iter 58/292 - loss 0.24400947 - time (sec): 3.33 - samples/sec: 2503.73 - lr: 0.000029 - momentum: 0.000000 2023-10-16 18:07:06,734 epoch 2 - iter 87/292 - loss 0.23895746 - time (sec): 4.98 - samples/sec: 2492.39 - lr: 0.000029 - momentum: 0.000000 2023-10-16 18:07:08,461 epoch 2 - iter 116/292 - loss 0.23745427 - time (sec): 6.71 - samples/sec: 2479.82 - lr: 0.000029 - momentum: 0.000000 2023-10-16 18:07:10,124 epoch 2 - iter 145/292 - loss 0.23323169 - time (sec): 8.37 - samples/sec: 2511.84 - lr: 0.000028 - momentum: 0.000000 2023-10-16 18:07:11,997 epoch 2 - iter 174/292 - loss 0.23315863 - time (sec): 10.24 - samples/sec: 2574.83 - lr: 0.000028 - momentum: 0.000000 2023-10-16 18:07:13,713 epoch 2 - iter 203/292 - loss 0.22673487 - time (sec): 11.96 - samples/sec: 2610.35 - lr: 0.000028 - momentum: 0.000000 2023-10-16 18:07:15,338 epoch 2 - iter 232/292 - loss 0.22123385 - time (sec): 13.58 - samples/sec: 2631.26 - lr: 0.000027 - momentum: 0.000000 2023-10-16 18:07:16,935 epoch 2 - iter 261/292 - loss 0.22344619 - time (sec): 15.18 - samples/sec: 2632.59 - lr: 0.000027 - momentum: 0.000000 2023-10-16 18:07:18,600 epoch 2 - iter 290/292 - loss 0.21652095 - time (sec): 16.85 - samples/sec: 2631.92 - lr: 0.000027 - momentum: 0.000000 2023-10-16 18:07:18,687 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:07:18,687 EPOCH 2 done: loss 0.2162 - lr: 0.000027 2023-10-16 18:07:19,989 DEV : loss 0.14250923693180084 - f1-score (micro avg) 0.6128 2023-10-16 18:07:19,995 saving best model 2023-10-16 18:07:20,524 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:07:22,235 epoch 3 - iter 29/292 - loss 0.14272483 - time (sec): 1.71 - samples/sec: 2548.95 - lr: 0.000026 - momentum: 0.000000 2023-10-16 18:07:23,769 epoch 3 - iter 58/292 - loss 0.12941947 - time (sec): 3.24 - samples/sec: 2716.78 - lr: 0.000026 - momentum: 0.000000 2023-10-16 18:07:25,434 epoch 3 - iter 87/292 - loss 0.13036619 - time (sec): 4.91 - samples/sec: 2744.37 - lr: 0.000026 - momentum: 0.000000 2023-10-16 18:07:26,970 epoch 3 - iter 116/292 - loss 0.12139105 - time (sec): 6.44 - samples/sec: 2681.79 - lr: 0.000025 - momentum: 0.000000 2023-10-16 18:07:28,568 epoch 3 - iter 145/292 - loss 0.11496517 - time (sec): 8.04 - samples/sec: 2685.19 - lr: 0.000025 - momentum: 0.000000 2023-10-16 18:07:30,170 epoch 3 - iter 174/292 - loss 0.11820563 - time (sec): 9.64 - samples/sec: 2672.93 - lr: 0.000025 - momentum: 0.000000 2023-10-16 18:07:32,002 epoch 3 - iter 203/292 - loss 0.12100791 - time (sec): 11.48 - samples/sec: 2706.76 - lr: 0.000024 - momentum: 0.000000 2023-10-16 18:07:33,666 epoch 3 - iter 232/292 - loss 0.11592913 - time (sec): 13.14 - samples/sec: 2704.67 - lr: 0.000024 - momentum: 0.000000 2023-10-16 18:07:35,220 epoch 3 - iter 261/292 - loss 0.11476202 - time (sec): 14.69 - samples/sec: 2706.54 - lr: 0.000024 - momentum: 0.000000 2023-10-16 18:07:37,049 epoch 3 - iter 290/292 - loss 0.11381579 - time (sec): 16.52 - samples/sec: 2679.81 - lr: 0.000023 - momentum: 0.000000 2023-10-16 18:07:37,143 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:07:37,143 EPOCH 3 done: loss 0.1135 - lr: 0.000023 2023-10-16 18:07:38,429 DEV : loss 0.12919388711452484 - f1-score (micro avg) 0.6814 2023-10-16 18:07:38,436 saving best model 2023-10-16 18:07:38,944 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:07:40,795 epoch 4 - iter 29/292 - loss 0.08033603 - time (sec): 1.85 - samples/sec: 2760.24 - lr: 0.000023 - momentum: 0.000000 2023-10-16 18:07:42,390 epoch 4 - iter 58/292 - loss 0.09107035 - time (sec): 3.44 - samples/sec: 2756.18 - lr: 0.000023 - momentum: 0.000000 2023-10-16 18:07:43,995 epoch 4 - iter 87/292 - loss 0.08192222 - time (sec): 5.05 - samples/sec: 2743.24 - lr: 0.000022 - momentum: 0.000000 2023-10-16 18:07:45,617 epoch 4 - iter 116/292 - loss 0.08082958 - time (sec): 6.67 - samples/sec: 2766.29 - lr: 0.000022 - momentum: 0.000000 2023-10-16 18:07:47,371 epoch 4 - iter 145/292 - loss 0.07535777 - time (sec): 8.43 - samples/sec: 2795.29 - lr: 0.000022 - momentum: 0.000000 2023-10-16 18:07:49,099 epoch 4 - iter 174/292 - loss 0.07310859 - time (sec): 10.15 - samples/sec: 2773.92 - lr: 0.000021 - momentum: 0.000000 2023-10-16 18:07:50,665 epoch 4 - iter 203/292 - loss 0.07590109 - time (sec): 11.72 - samples/sec: 2763.34 - lr: 0.000021 - momentum: 0.000000 2023-10-16 18:07:52,377 epoch 4 - iter 232/292 - loss 0.07416772 - time (sec): 13.43 - samples/sec: 2702.56 - lr: 0.000021 - momentum: 0.000000 2023-10-16 18:07:54,003 epoch 4 - iter 261/292 - loss 0.07169329 - time (sec): 15.06 - samples/sec: 2706.89 - lr: 0.000020 - momentum: 0.000000 2023-10-16 18:07:55,724 epoch 4 - iter 290/292 - loss 0.06896376 - time (sec): 16.78 - samples/sec: 2641.64 - lr: 0.000020 - momentum: 0.000000 2023-10-16 18:07:55,807 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:07:55,807 EPOCH 4 done: loss 0.0688 - lr: 0.000020 2023-10-16 18:07:57,088 DEV : loss 0.12307216227054596 - f1-score (micro avg) 0.7595 2023-10-16 18:07:57,094 saving best model 2023-10-16 18:07:57,646 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:07:59,313 epoch 5 - iter 29/292 - loss 0.03955736 - time (sec): 1.67 - samples/sec: 2868.56 - lr: 0.000020 - momentum: 0.000000 2023-10-16 18:08:01,015 epoch 5 - iter 58/292 - loss 0.04332368 - time (sec): 3.37 - samples/sec: 2807.90 - lr: 0.000019 - momentum: 0.000000 2023-10-16 18:08:02,822 epoch 5 - iter 87/292 - loss 0.04161002 - time (sec): 5.17 - samples/sec: 2811.24 - lr: 0.000019 - momentum: 0.000000 2023-10-16 18:08:04,459 epoch 5 - iter 116/292 - loss 0.03749425 - time (sec): 6.81 - samples/sec: 2772.83 - lr: 0.000019 - momentum: 0.000000 2023-10-16 18:08:05,992 epoch 5 - iter 145/292 - loss 0.03825239 - time (sec): 8.34 - samples/sec: 2738.86 - lr: 0.000018 - momentum: 0.000000 2023-10-16 18:08:07,530 epoch 5 - iter 174/292 - loss 0.03808185 - time (sec): 9.88 - samples/sec: 2697.02 - lr: 0.000018 - momentum: 0.000000 2023-10-16 18:08:09,124 epoch 5 - iter 203/292 - loss 0.03880310 - time (sec): 11.48 - samples/sec: 2671.49 - lr: 0.000018 - momentum: 0.000000 2023-10-16 18:08:10,845 epoch 5 - iter 232/292 - loss 0.04783043 - time (sec): 13.20 - samples/sec: 2665.68 - lr: 0.000017 - momentum: 0.000000 2023-10-16 18:08:12,530 epoch 5 - iter 261/292 - loss 0.04825293 - time (sec): 14.88 - samples/sec: 2628.51 - lr: 0.000017 - momentum: 0.000000 2023-10-16 18:08:14,265 epoch 5 - iter 290/292 - loss 0.05008356 - time (sec): 16.62 - samples/sec: 2662.54 - lr: 0.000017 - momentum: 0.000000 2023-10-16 18:08:14,357 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:08:14,358 EPOCH 5 done: loss 0.0499 - lr: 0.000017 2023-10-16 18:08:15,621 DEV : loss 0.1350400596857071 - f1-score (micro avg) 0.766 2023-10-16 18:08:15,626 saving best model 2023-10-16 18:08:16,115 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:08:17,779 epoch 6 - iter 29/292 - loss 0.04105101 - time (sec): 1.66 - samples/sec: 2305.85 - lr: 0.000016 - momentum: 0.000000 2023-10-16 18:08:19,486 epoch 6 - iter 58/292 - loss 0.04502925 - time (sec): 3.37 - samples/sec: 2453.67 - lr: 0.000016 - momentum: 0.000000 2023-10-16 18:08:21,059 epoch 6 - iter 87/292 - loss 0.03516229 - time (sec): 4.94 - samples/sec: 2487.33 - lr: 0.000016 - momentum: 0.000000 2023-10-16 18:08:22,563 epoch 6 - iter 116/292 - loss 0.03210820 - time (sec): 6.45 - samples/sec: 2555.05 - lr: 0.000015 - momentum: 0.000000 2023-10-16 18:08:24,329 epoch 6 - iter 145/292 - loss 0.02906604 - time (sec): 8.21 - samples/sec: 2574.10 - lr: 0.000015 - momentum: 0.000000 2023-10-16 18:08:25,816 epoch 6 - iter 174/292 - loss 0.02948663 - time (sec): 9.70 - samples/sec: 2553.14 - lr: 0.000015 - momentum: 0.000000 2023-10-16 18:08:27,546 epoch 6 - iter 203/292 - loss 0.03338297 - time (sec): 11.43 - samples/sec: 2570.13 - lr: 0.000014 - momentum: 0.000000 2023-10-16 18:08:29,334 epoch 6 - iter 232/292 - loss 0.03431716 - time (sec): 13.22 - samples/sec: 2607.40 - lr: 0.000014 - momentum: 0.000000 2023-10-16 18:08:31,046 epoch 6 - iter 261/292 - loss 0.03709105 - time (sec): 14.93 - samples/sec: 2646.56 - lr: 0.000014 - momentum: 0.000000 2023-10-16 18:08:32,738 epoch 6 - iter 290/292 - loss 0.03590560 - time (sec): 16.62 - samples/sec: 2660.19 - lr: 0.000013 - momentum: 0.000000 2023-10-16 18:08:32,826 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:08:32,826 EPOCH 6 done: loss 0.0359 - lr: 0.000013 2023-10-16 18:08:34,041 DEV : loss 0.1274399310350418 - f1-score (micro avg) 0.8009 2023-10-16 18:08:34,045 saving best model 2023-10-16 18:08:34,569 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:08:36,179 epoch 7 - iter 29/292 - loss 0.03485645 - time (sec): 1.61 - samples/sec: 2550.22 - lr: 0.000013 - momentum: 0.000000 2023-10-16 18:08:37,955 epoch 7 - iter 58/292 - loss 0.03354645 - time (sec): 3.38 - samples/sec: 2752.99 - lr: 0.000013 - momentum: 0.000000 2023-10-16 18:08:39,678 epoch 7 - iter 87/292 - loss 0.02845269 - time (sec): 5.11 - samples/sec: 2774.75 - lr: 0.000012 - momentum: 0.000000 2023-10-16 18:08:41,459 epoch 7 - iter 116/292 - loss 0.03214827 - time (sec): 6.89 - samples/sec: 2730.26 - lr: 0.000012 - momentum: 0.000000 2023-10-16 18:08:43,094 epoch 7 - iter 145/292 - loss 0.03051476 - time (sec): 8.52 - samples/sec: 2687.09 - lr: 0.000012 - momentum: 0.000000 2023-10-16 18:08:44,601 epoch 7 - iter 174/292 - loss 0.02794386 - time (sec): 10.03 - samples/sec: 2685.33 - lr: 0.000011 - momentum: 0.000000 2023-10-16 18:08:46,149 epoch 7 - iter 203/292 - loss 0.02570380 - time (sec): 11.58 - samples/sec: 2670.47 - lr: 0.000011 - momentum: 0.000000 2023-10-16 18:08:47,783 epoch 7 - iter 232/292 - loss 0.02741901 - time (sec): 13.21 - samples/sec: 2705.87 - lr: 0.000011 - momentum: 0.000000 2023-10-16 18:08:49,317 epoch 7 - iter 261/292 - loss 0.02628470 - time (sec): 14.75 - samples/sec: 2692.65 - lr: 0.000010 - momentum: 0.000000 2023-10-16 18:08:51,015 epoch 7 - iter 290/292 - loss 0.02776424 - time (sec): 16.44 - samples/sec: 2685.07 - lr: 0.000010 - momentum: 0.000000 2023-10-16 18:08:51,122 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:08:51,123 EPOCH 7 done: loss 0.0276 - lr: 0.000010 2023-10-16 18:08:52,409 DEV : loss 0.14393527805805206 - f1-score (micro avg) 0.7603 2023-10-16 18:08:52,413 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:08:54,121 epoch 8 - iter 29/292 - loss 0.02543747 - time (sec): 1.71 - samples/sec: 2790.30 - lr: 0.000010 - momentum: 0.000000 2023-10-16 18:08:55,512 epoch 8 - iter 58/292 - loss 0.02833714 - time (sec): 3.10 - samples/sec: 2578.54 - lr: 0.000009 - momentum: 0.000000 2023-10-16 18:08:57,277 epoch 8 - iter 87/292 - loss 0.02513038 - time (sec): 4.86 - samples/sec: 2585.81 - lr: 0.000009 - momentum: 0.000000 2023-10-16 18:08:59,029 epoch 8 - iter 116/292 - loss 0.02477383 - time (sec): 6.61 - samples/sec: 2622.43 - lr: 0.000009 - momentum: 0.000000 2023-10-16 18:09:00,644 epoch 8 - iter 145/292 - loss 0.02605667 - time (sec): 8.23 - samples/sec: 2653.35 - lr: 0.000008 - momentum: 0.000000 2023-10-16 18:09:02,344 epoch 8 - iter 174/292 - loss 0.02378070 - time (sec): 9.93 - samples/sec: 2690.71 - lr: 0.000008 - momentum: 0.000000 2023-10-16 18:09:04,137 epoch 8 - iter 203/292 - loss 0.02209407 - time (sec): 11.72 - samples/sec: 2726.41 - lr: 0.000008 - momentum: 0.000000 2023-10-16 18:09:05,761 epoch 8 - iter 232/292 - loss 0.02304147 - time (sec): 13.35 - samples/sec: 2733.41 - lr: 0.000007 - momentum: 0.000000 2023-10-16 18:09:07,268 epoch 8 - iter 261/292 - loss 0.02152923 - time (sec): 14.85 - samples/sec: 2696.49 - lr: 0.000007 - momentum: 0.000000 2023-10-16 18:09:08,856 epoch 8 - iter 290/292 - loss 0.02045988 - time (sec): 16.44 - samples/sec: 2689.85 - lr: 0.000007 - momentum: 0.000000 2023-10-16 18:09:08,958 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:09:08,958 EPOCH 8 done: loss 0.0204 - lr: 0.000007 2023-10-16 18:09:10,489 DEV : loss 0.153466135263443 - f1-score (micro avg) 0.7689 2023-10-16 18:09:10,493 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:09:12,364 epoch 9 - iter 29/292 - loss 0.00758189 - time (sec): 1.87 - samples/sec: 2919.25 - lr: 0.000006 - momentum: 0.000000 2023-10-16 18:09:14,012 epoch 9 - iter 58/292 - loss 0.01509998 - time (sec): 3.52 - samples/sec: 2654.32 - lr: 0.000006 - momentum: 0.000000 2023-10-16 18:09:15,852 epoch 9 - iter 87/292 - loss 0.02017623 - time (sec): 5.36 - samples/sec: 2611.26 - lr: 0.000006 - momentum: 0.000000 2023-10-16 18:09:17,652 epoch 9 - iter 116/292 - loss 0.02170409 - time (sec): 7.16 - samples/sec: 2590.86 - lr: 0.000005 - momentum: 0.000000 2023-10-16 18:09:19,465 epoch 9 - iter 145/292 - loss 0.01832989 - time (sec): 8.97 - samples/sec: 2530.51 - lr: 0.000005 - momentum: 0.000000 2023-10-16 18:09:21,042 epoch 9 - iter 174/292 - loss 0.01817301 - time (sec): 10.55 - samples/sec: 2525.19 - lr: 0.000005 - momentum: 0.000000 2023-10-16 18:09:22,702 epoch 9 - iter 203/292 - loss 0.01751352 - time (sec): 12.21 - samples/sec: 2524.94 - lr: 0.000004 - momentum: 0.000000 2023-10-16 18:09:24,524 epoch 9 - iter 232/292 - loss 0.01770380 - time (sec): 14.03 - samples/sec: 2524.00 - lr: 0.000004 - momentum: 0.000000 2023-10-16 18:09:26,227 epoch 9 - iter 261/292 - loss 0.01613702 - time (sec): 15.73 - samples/sec: 2538.20 - lr: 0.000004 - momentum: 0.000000 2023-10-16 18:09:27,928 epoch 9 - iter 290/292 - loss 0.01569710 - time (sec): 17.43 - samples/sec: 2542.91 - lr: 0.000003 - momentum: 0.000000 2023-10-16 18:09:28,012 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:09:28,012 EPOCH 9 done: loss 0.0157 - lr: 0.000003 2023-10-16 18:09:29,289 DEV : loss 0.1515672355890274 - f1-score (micro avg) 0.756 2023-10-16 18:09:29,294 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:09:30,951 epoch 10 - iter 29/292 - loss 0.00834902 - time (sec): 1.66 - samples/sec: 2534.54 - lr: 0.000003 - momentum: 0.000000 2023-10-16 18:09:32,573 epoch 10 - iter 58/292 - loss 0.00778017 - time (sec): 3.28 - samples/sec: 2506.92 - lr: 0.000003 - momentum: 0.000000 2023-10-16 18:09:34,325 epoch 10 - iter 87/292 - loss 0.00669187 - time (sec): 5.03 - samples/sec: 2503.73 - lr: 0.000002 - momentum: 0.000000 2023-10-16 18:09:36,116 epoch 10 - iter 116/292 - loss 0.00906403 - time (sec): 6.82 - samples/sec: 2544.52 - lr: 0.000002 - momentum: 0.000000 2023-10-16 18:09:37,714 epoch 10 - iter 145/292 - loss 0.00978993 - time (sec): 8.42 - samples/sec: 2531.12 - lr: 0.000002 - momentum: 0.000000 2023-10-16 18:09:39,369 epoch 10 - iter 174/292 - loss 0.01122421 - time (sec): 10.07 - samples/sec: 2571.88 - lr: 0.000001 - momentum: 0.000000 2023-10-16 18:09:40,888 epoch 10 - iter 203/292 - loss 0.01027289 - time (sec): 11.59 - samples/sec: 2600.77 - lr: 0.000001 - momentum: 0.000000 2023-10-16 18:09:42,713 epoch 10 - iter 232/292 - loss 0.00980163 - time (sec): 13.42 - samples/sec: 2584.77 - lr: 0.000001 - momentum: 0.000000 2023-10-16 18:09:44,267 epoch 10 - iter 261/292 - loss 0.00981947 - time (sec): 14.97 - samples/sec: 2603.16 - lr: 0.000000 - momentum: 0.000000 2023-10-16 18:09:46,020 epoch 10 - iter 290/292 - loss 0.01181952 - time (sec): 16.73 - samples/sec: 2643.22 - lr: 0.000000 - momentum: 0.000000 2023-10-16 18:09:46,120 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:09:46,120 EPOCH 10 done: loss 0.0124 - lr: 0.000000 2023-10-16 18:09:47,424 DEV : loss 0.15941958129405975 - f1-score (micro avg) 0.7452 2023-10-16 18:09:47,808 ---------------------------------------------------------------------------------------------------- 2023-10-16 18:09:47,810 Loading model from best epoch ... 2023-10-16 18:09:49,528 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-16 18:09:52,319 Results: - F-score (micro) 0.7547 - F-score (macro) 0.6975 - Accuracy 0.6325 By class: precision recall f1-score support PER 0.8187 0.8305 0.8245 348 LOC 0.6433 0.8084 0.7165 261 ORG 0.5128 0.3846 0.4396 52 HumanProd 0.8500 0.7727 0.8095 22 micro avg 0.7257 0.7862 0.7547 683 macro avg 0.7062 0.6991 0.6975 683 weighted avg 0.7294 0.7862 0.7534 683 2023-10-16 18:09:52,319 ----------------------------------------------------------------------------------------------------