2023-10-11 13:09:29,492 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:09:29,495 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-11 13:09:29,495 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:09:29,495 MultiCorpus: 1085 train + 148 dev + 364 test sentences - NER_HIPE_2022 Corpus: 1085 train + 148 dev + 364 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/sv/with_doc_seperator 2023-10-11 13:09:29,495 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:09:29,495 Train: 1085 sentences 2023-10-11 13:09:29,495 (train_with_dev=False, train_with_test=False) 2023-10-11 13:09:29,495 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:09:29,495 Training Params: 2023-10-11 13:09:29,495 - learning_rate: "0.00015" 2023-10-11 13:09:29,496 - mini_batch_size: "4" 2023-10-11 13:09:29,496 - max_epochs: "10" 2023-10-11 13:09:29,496 - shuffle: "True" 2023-10-11 13:09:29,496 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:09:29,496 Plugins: 2023-10-11 13:09:29,496 - TensorboardLogger 2023-10-11 13:09:29,496 - LinearScheduler | warmup_fraction: '0.1' 2023-10-11 13:09:29,496 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:09:29,496 Final evaluation on model from best epoch (best-model.pt) 2023-10-11 13:09:29,496 - metric: "('micro avg', 'f1-score')" 2023-10-11 13:09:29,496 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:09:29,496 Computation: 2023-10-11 13:09:29,496 - compute on device: cuda:0 2023-10-11 13:09:29,496 - embedding storage: none 2023-10-11 13:09:29,496 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:09:29,496 Model training base path: "hmbench-newseye/sv-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-5" 2023-10-11 13:09:29,497 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:09:29,497 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:09:29,497 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-11 13:09:39,376 epoch 1 - iter 27/272 - loss 2.82017975 - time (sec): 9.88 - samples/sec: 571.41 - lr: 0.000014 - momentum: 0.000000 2023-10-11 13:09:48,429 epoch 1 - iter 54/272 - loss 2.81198645 - time (sec): 18.93 - samples/sec: 547.73 - lr: 0.000029 - momentum: 0.000000 2023-10-11 13:09:57,442 epoch 1 - iter 81/272 - loss 2.79406251 - time (sec): 27.94 - samples/sec: 534.30 - lr: 0.000044 - momentum: 0.000000 2023-10-11 13:10:08,054 epoch 1 - iter 108/272 - loss 2.72687206 - time (sec): 38.56 - samples/sec: 550.43 - lr: 0.000059 - momentum: 0.000000 2023-10-11 13:10:17,491 epoch 1 - iter 135/272 - loss 2.64533462 - time (sec): 47.99 - samples/sec: 550.78 - lr: 0.000074 - momentum: 0.000000 2023-10-11 13:10:27,175 epoch 1 - iter 162/272 - loss 2.54392629 - time (sec): 57.68 - samples/sec: 549.91 - lr: 0.000089 - momentum: 0.000000 2023-10-11 13:10:36,492 epoch 1 - iter 189/272 - loss 2.44259599 - time (sec): 66.99 - samples/sec: 547.29 - lr: 0.000104 - momentum: 0.000000 2023-10-11 13:10:45,661 epoch 1 - iter 216/272 - loss 2.34019983 - time (sec): 76.16 - samples/sec: 543.57 - lr: 0.000119 - momentum: 0.000000 2023-10-11 13:10:54,498 epoch 1 - iter 243/272 - loss 2.24446431 - time (sec): 85.00 - samples/sec: 539.80 - lr: 0.000133 - momentum: 0.000000 2023-10-11 13:11:04,895 epoch 1 - iter 270/272 - loss 2.10357772 - time (sec): 95.40 - samples/sec: 543.29 - lr: 0.000148 - momentum: 0.000000 2023-10-11 13:11:05,300 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:11:05,300 EPOCH 1 done: loss 2.0999 - lr: 0.000148 2023-10-11 13:11:10,495 DEV : loss 0.7871720194816589 - f1-score (micro avg) 0.0 2023-10-11 13:11:10,504 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:11:19,799 epoch 2 - iter 27/272 - loss 0.79101376 - time (sec): 9.29 - samples/sec: 513.17 - lr: 0.000148 - momentum: 0.000000 2023-10-11 13:11:29,461 epoch 2 - iter 54/272 - loss 0.71052680 - time (sec): 18.96 - samples/sec: 529.15 - lr: 0.000147 - momentum: 0.000000 2023-10-11 13:11:38,965 epoch 2 - iter 81/272 - loss 0.67490954 - time (sec): 28.46 - samples/sec: 531.82 - lr: 0.000145 - momentum: 0.000000 2023-10-11 13:11:49,057 epoch 2 - iter 108/272 - loss 0.62296579 - time (sec): 38.55 - samples/sec: 540.97 - lr: 0.000143 - momentum: 0.000000 2023-10-11 13:11:58,488 epoch 2 - iter 135/272 - loss 0.60641748 - time (sec): 47.98 - samples/sec: 540.39 - lr: 0.000142 - momentum: 0.000000 2023-10-11 13:12:08,344 epoch 2 - iter 162/272 - loss 0.58390681 - time (sec): 57.84 - samples/sec: 542.86 - lr: 0.000140 - momentum: 0.000000 2023-10-11 13:12:18,273 epoch 2 - iter 189/272 - loss 0.55506002 - time (sec): 67.77 - samples/sec: 541.84 - lr: 0.000138 - momentum: 0.000000 2023-10-11 13:12:27,349 epoch 2 - iter 216/272 - loss 0.53417108 - time (sec): 76.84 - samples/sec: 537.71 - lr: 0.000137 - momentum: 0.000000 2023-10-11 13:12:36,566 epoch 2 - iter 243/272 - loss 0.52025397 - time (sec): 86.06 - samples/sec: 537.01 - lr: 0.000135 - momentum: 0.000000 2023-10-11 13:12:46,295 epoch 2 - iter 270/272 - loss 0.49807868 - time (sec): 95.79 - samples/sec: 538.43 - lr: 0.000134 - momentum: 0.000000 2023-10-11 13:12:46,928 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:12:46,929 EPOCH 2 done: loss 0.4972 - lr: 0.000134 2023-10-11 13:12:52,875 DEV : loss 0.2955166697502136 - f1-score (micro avg) 0.4394 2023-10-11 13:12:52,884 saving best model 2023-10-11 13:12:53,727 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:13:02,616 epoch 3 - iter 27/272 - loss 0.34336911 - time (sec): 8.89 - samples/sec: 523.49 - lr: 0.000132 - momentum: 0.000000 2023-10-11 13:13:13,494 epoch 3 - iter 54/272 - loss 0.30648810 - time (sec): 19.76 - samples/sec: 564.39 - lr: 0.000130 - momentum: 0.000000 2023-10-11 13:13:23,468 epoch 3 - iter 81/272 - loss 0.29198034 - time (sec): 29.74 - samples/sec: 562.43 - lr: 0.000128 - momentum: 0.000000 2023-10-11 13:13:32,799 epoch 3 - iter 108/272 - loss 0.28951492 - time (sec): 39.07 - samples/sec: 546.13 - lr: 0.000127 - momentum: 0.000000 2023-10-11 13:13:42,226 epoch 3 - iter 135/272 - loss 0.28229197 - time (sec): 48.50 - samples/sec: 545.36 - lr: 0.000125 - momentum: 0.000000 2023-10-11 13:13:52,082 epoch 3 - iter 162/272 - loss 0.28226115 - time (sec): 58.35 - samples/sec: 547.41 - lr: 0.000123 - momentum: 0.000000 2023-10-11 13:14:01,329 epoch 3 - iter 189/272 - loss 0.28080431 - time (sec): 67.60 - samples/sec: 543.23 - lr: 0.000122 - momentum: 0.000000 2023-10-11 13:14:10,768 epoch 3 - iter 216/272 - loss 0.27438629 - time (sec): 77.04 - samples/sec: 542.60 - lr: 0.000120 - momentum: 0.000000 2023-10-11 13:14:20,209 epoch 3 - iter 243/272 - loss 0.26560305 - time (sec): 86.48 - samples/sec: 539.20 - lr: 0.000119 - momentum: 0.000000 2023-10-11 13:14:29,635 epoch 3 - iter 270/272 - loss 0.26630953 - time (sec): 95.91 - samples/sec: 540.48 - lr: 0.000117 - momentum: 0.000000 2023-10-11 13:14:30,020 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:14:30,020 EPOCH 3 done: loss 0.2661 - lr: 0.000117 2023-10-11 13:14:35,743 DEV : loss 0.1891184151172638 - f1-score (micro avg) 0.6248 2023-10-11 13:14:35,752 saving best model 2023-10-11 13:14:38,296 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:14:47,491 epoch 4 - iter 27/272 - loss 0.19267975 - time (sec): 9.19 - samples/sec: 513.04 - lr: 0.000115 - momentum: 0.000000 2023-10-11 13:14:57,471 epoch 4 - iter 54/272 - loss 0.19054158 - time (sec): 19.17 - samples/sec: 524.71 - lr: 0.000113 - momentum: 0.000000 2023-10-11 13:15:08,023 epoch 4 - iter 81/272 - loss 0.18246952 - time (sec): 29.72 - samples/sec: 536.80 - lr: 0.000112 - momentum: 0.000000 2023-10-11 13:15:17,799 epoch 4 - iter 108/272 - loss 0.17446918 - time (sec): 39.50 - samples/sec: 542.94 - lr: 0.000110 - momentum: 0.000000 2023-10-11 13:15:26,973 epoch 4 - iter 135/272 - loss 0.17469825 - time (sec): 48.67 - samples/sec: 541.13 - lr: 0.000108 - momentum: 0.000000 2023-10-11 13:15:36,607 epoch 4 - iter 162/272 - loss 0.16413413 - time (sec): 58.31 - samples/sec: 545.12 - lr: 0.000107 - momentum: 0.000000 2023-10-11 13:15:46,200 epoch 4 - iter 189/272 - loss 0.16415812 - time (sec): 67.90 - samples/sec: 539.69 - lr: 0.000105 - momentum: 0.000000 2023-10-11 13:15:55,944 epoch 4 - iter 216/272 - loss 0.16298361 - time (sec): 77.64 - samples/sec: 539.85 - lr: 0.000103 - momentum: 0.000000 2023-10-11 13:16:05,439 epoch 4 - iter 243/272 - loss 0.16548331 - time (sec): 87.14 - samples/sec: 537.75 - lr: 0.000102 - momentum: 0.000000 2023-10-11 13:16:14,765 epoch 4 - iter 270/272 - loss 0.16290655 - time (sec): 96.46 - samples/sec: 537.15 - lr: 0.000100 - momentum: 0.000000 2023-10-11 13:16:15,180 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:16:15,180 EPOCH 4 done: loss 0.1633 - lr: 0.000100 2023-10-11 13:16:20,930 DEV : loss 0.14617015421390533 - f1-score (micro avg) 0.686 2023-10-11 13:16:20,939 saving best model 2023-10-11 13:16:23,475 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:16:33,728 epoch 5 - iter 27/272 - loss 0.15319500 - time (sec): 10.25 - samples/sec: 570.34 - lr: 0.000098 - momentum: 0.000000 2023-10-11 13:16:43,416 epoch 5 - iter 54/272 - loss 0.14820840 - time (sec): 19.94 - samples/sec: 562.38 - lr: 0.000097 - momentum: 0.000000 2023-10-11 13:16:52,122 epoch 5 - iter 81/272 - loss 0.13818758 - time (sec): 28.64 - samples/sec: 542.20 - lr: 0.000095 - momentum: 0.000000 2023-10-11 13:17:01,635 epoch 5 - iter 108/272 - loss 0.13056305 - time (sec): 38.16 - samples/sec: 543.01 - lr: 0.000093 - momentum: 0.000000 2023-10-11 13:17:10,347 epoch 5 - iter 135/272 - loss 0.12686995 - time (sec): 46.87 - samples/sec: 534.15 - lr: 0.000092 - momentum: 0.000000 2023-10-11 13:17:20,047 epoch 5 - iter 162/272 - loss 0.11740312 - time (sec): 56.57 - samples/sec: 538.35 - lr: 0.000090 - momentum: 0.000000 2023-10-11 13:17:29,507 epoch 5 - iter 189/272 - loss 0.11502572 - time (sec): 66.03 - samples/sec: 538.50 - lr: 0.000088 - momentum: 0.000000 2023-10-11 13:17:39,913 epoch 5 - iter 216/272 - loss 0.11581681 - time (sec): 76.43 - samples/sec: 544.73 - lr: 0.000087 - momentum: 0.000000 2023-10-11 13:17:49,112 epoch 5 - iter 243/272 - loss 0.11020025 - time (sec): 85.63 - samples/sec: 541.12 - lr: 0.000085 - momentum: 0.000000 2023-10-11 13:17:58,535 epoch 5 - iter 270/272 - loss 0.10938645 - time (sec): 95.06 - samples/sec: 539.87 - lr: 0.000084 - momentum: 0.000000 2023-10-11 13:17:59,373 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:17:59,373 EPOCH 5 done: loss 0.1087 - lr: 0.000084 2023-10-11 13:18:05,260 DEV : loss 0.12970557808876038 - f1-score (micro avg) 0.7782 2023-10-11 13:18:05,268 saving best model 2023-10-11 13:18:07,813 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:18:17,449 epoch 6 - iter 27/272 - loss 0.08697393 - time (sec): 9.63 - samples/sec: 566.07 - lr: 0.000082 - momentum: 0.000000 2023-10-11 13:18:26,302 epoch 6 - iter 54/272 - loss 0.08490523 - time (sec): 18.48 - samples/sec: 537.15 - lr: 0.000080 - momentum: 0.000000 2023-10-11 13:18:36,045 epoch 6 - iter 81/272 - loss 0.08898586 - time (sec): 28.23 - samples/sec: 546.35 - lr: 0.000078 - momentum: 0.000000 2023-10-11 13:18:45,268 epoch 6 - iter 108/272 - loss 0.08812781 - time (sec): 37.45 - samples/sec: 547.61 - lr: 0.000077 - momentum: 0.000000 2023-10-11 13:18:54,730 epoch 6 - iter 135/272 - loss 0.08125660 - time (sec): 46.91 - samples/sec: 547.04 - lr: 0.000075 - momentum: 0.000000 2023-10-11 13:19:03,883 epoch 6 - iter 162/272 - loss 0.08079255 - time (sec): 56.07 - samples/sec: 542.30 - lr: 0.000073 - momentum: 0.000000 2023-10-11 13:19:13,710 epoch 6 - iter 189/272 - loss 0.07658614 - time (sec): 65.89 - samples/sec: 539.76 - lr: 0.000072 - momentum: 0.000000 2023-10-11 13:19:24,302 epoch 6 - iter 216/272 - loss 0.07746535 - time (sec): 76.48 - samples/sec: 539.77 - lr: 0.000070 - momentum: 0.000000 2023-10-11 13:19:34,054 epoch 6 - iter 243/272 - loss 0.07630159 - time (sec): 86.24 - samples/sec: 536.17 - lr: 0.000069 - momentum: 0.000000 2023-10-11 13:19:44,094 epoch 6 - iter 270/272 - loss 0.07449806 - time (sec): 96.28 - samples/sec: 537.63 - lr: 0.000067 - momentum: 0.000000 2023-10-11 13:19:44,548 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:19:44,548 EPOCH 6 done: loss 0.0755 - lr: 0.000067 2023-10-11 13:19:50,727 DEV : loss 0.13596650958061218 - f1-score (micro avg) 0.7802 2023-10-11 13:19:50,741 saving best model 2023-10-11 13:19:53,401 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:20:04,340 epoch 7 - iter 27/272 - loss 0.07151511 - time (sec): 10.94 - samples/sec: 544.36 - lr: 0.000065 - momentum: 0.000000 2023-10-11 13:20:14,460 epoch 7 - iter 54/272 - loss 0.06298137 - time (sec): 21.06 - samples/sec: 535.86 - lr: 0.000063 - momentum: 0.000000 2023-10-11 13:20:24,037 epoch 7 - iter 81/272 - loss 0.06852612 - time (sec): 30.63 - samples/sec: 529.97 - lr: 0.000062 - momentum: 0.000000 2023-10-11 13:20:33,314 epoch 7 - iter 108/272 - loss 0.06310370 - time (sec): 39.91 - samples/sec: 525.06 - lr: 0.000060 - momentum: 0.000000 2023-10-11 13:20:43,363 epoch 7 - iter 135/272 - loss 0.06476083 - time (sec): 49.96 - samples/sec: 528.83 - lr: 0.000058 - momentum: 0.000000 2023-10-11 13:20:53,660 epoch 7 - iter 162/272 - loss 0.05980500 - time (sec): 60.26 - samples/sec: 535.29 - lr: 0.000057 - momentum: 0.000000 2023-10-11 13:21:03,338 epoch 7 - iter 189/272 - loss 0.05858969 - time (sec): 69.93 - samples/sec: 535.33 - lr: 0.000055 - momentum: 0.000000 2023-10-11 13:21:12,953 epoch 7 - iter 216/272 - loss 0.06293113 - time (sec): 79.55 - samples/sec: 532.76 - lr: 0.000053 - momentum: 0.000000 2023-10-11 13:21:21,594 epoch 7 - iter 243/272 - loss 0.06038448 - time (sec): 88.19 - samples/sec: 525.24 - lr: 0.000052 - momentum: 0.000000 2023-10-11 13:21:31,424 epoch 7 - iter 270/272 - loss 0.05738546 - time (sec): 98.02 - samples/sec: 527.79 - lr: 0.000050 - momentum: 0.000000 2023-10-11 13:21:31,922 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:21:31,922 EPOCH 7 done: loss 0.0574 - lr: 0.000050 2023-10-11 13:21:37,845 DEV : loss 0.12629856169223785 - f1-score (micro avg) 0.8 2023-10-11 13:21:37,854 saving best model 2023-10-11 13:21:40,425 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:21:49,568 epoch 8 - iter 27/272 - loss 0.03692180 - time (sec): 9.14 - samples/sec: 522.07 - lr: 0.000048 - momentum: 0.000000 2023-10-11 13:21:58,535 epoch 8 - iter 54/272 - loss 0.04304127 - time (sec): 18.11 - samples/sec: 515.86 - lr: 0.000047 - momentum: 0.000000 2023-10-11 13:22:08,106 epoch 8 - iter 81/272 - loss 0.04492744 - time (sec): 27.68 - samples/sec: 524.05 - lr: 0.000045 - momentum: 0.000000 2023-10-11 13:22:19,082 epoch 8 - iter 108/272 - loss 0.04289346 - time (sec): 38.65 - samples/sec: 533.91 - lr: 0.000043 - momentum: 0.000000 2023-10-11 13:22:28,087 epoch 8 - iter 135/272 - loss 0.04671101 - time (sec): 47.66 - samples/sec: 531.12 - lr: 0.000042 - momentum: 0.000000 2023-10-11 13:22:37,505 epoch 8 - iter 162/272 - loss 0.04694481 - time (sec): 57.08 - samples/sec: 535.95 - lr: 0.000040 - momentum: 0.000000 2023-10-11 13:22:47,043 epoch 8 - iter 189/272 - loss 0.04703120 - time (sec): 66.61 - samples/sec: 539.57 - lr: 0.000038 - momentum: 0.000000 2023-10-11 13:22:56,431 epoch 8 - iter 216/272 - loss 0.04473500 - time (sec): 76.00 - samples/sec: 541.82 - lr: 0.000037 - momentum: 0.000000 2023-10-11 13:23:06,007 epoch 8 - iter 243/272 - loss 0.04459354 - time (sec): 85.58 - samples/sec: 544.67 - lr: 0.000035 - momentum: 0.000000 2023-10-11 13:23:15,526 epoch 8 - iter 270/272 - loss 0.04481176 - time (sec): 95.10 - samples/sec: 545.90 - lr: 0.000034 - momentum: 0.000000 2023-10-11 13:23:15,869 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:23:15,869 EPOCH 8 done: loss 0.0450 - lr: 0.000034 2023-10-11 13:23:21,558 DEV : loss 0.12960414588451385 - f1-score (micro avg) 0.7782 2023-10-11 13:23:21,566 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:23:30,857 epoch 9 - iter 27/272 - loss 0.03796750 - time (sec): 9.29 - samples/sec: 544.75 - lr: 0.000032 - momentum: 0.000000 2023-10-11 13:23:40,261 epoch 9 - iter 54/272 - loss 0.04594063 - time (sec): 18.69 - samples/sec: 556.90 - lr: 0.000030 - momentum: 0.000000 2023-10-11 13:23:49,865 epoch 9 - iter 81/272 - loss 0.04313347 - time (sec): 28.30 - samples/sec: 559.29 - lr: 0.000028 - momentum: 0.000000 2023-10-11 13:23:59,378 epoch 9 - iter 108/272 - loss 0.03968736 - time (sec): 37.81 - samples/sec: 552.16 - lr: 0.000027 - momentum: 0.000000 2023-10-11 13:24:09,034 epoch 9 - iter 135/272 - loss 0.03928521 - time (sec): 47.47 - samples/sec: 550.93 - lr: 0.000025 - momentum: 0.000000 2023-10-11 13:24:18,685 epoch 9 - iter 162/272 - loss 0.03901635 - time (sec): 57.12 - samples/sec: 551.63 - lr: 0.000023 - momentum: 0.000000 2023-10-11 13:24:28,210 epoch 9 - iter 189/272 - loss 0.03840262 - time (sec): 66.64 - samples/sec: 547.60 - lr: 0.000022 - momentum: 0.000000 2023-10-11 13:24:37,304 epoch 9 - iter 216/272 - loss 0.04002943 - time (sec): 75.74 - samples/sec: 546.89 - lr: 0.000020 - momentum: 0.000000 2023-10-11 13:24:46,482 epoch 9 - iter 243/272 - loss 0.03766633 - time (sec): 84.91 - samples/sec: 547.65 - lr: 0.000019 - momentum: 0.000000 2023-10-11 13:24:55,673 epoch 9 - iter 270/272 - loss 0.03760281 - time (sec): 94.10 - samples/sec: 547.45 - lr: 0.000017 - momentum: 0.000000 2023-10-11 13:24:56,337 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:24:56,338 EPOCH 9 done: loss 0.0378 - lr: 0.000017 2023-10-11 13:25:01,878 DEV : loss 0.12975578010082245 - f1-score (micro avg) 0.7883 2023-10-11 13:25:01,888 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:25:11,078 epoch 10 - iter 27/272 - loss 0.02470685 - time (sec): 9.19 - samples/sec: 560.90 - lr: 0.000015 - momentum: 0.000000 2023-10-11 13:25:19,920 epoch 10 - iter 54/272 - loss 0.02211500 - time (sec): 18.03 - samples/sec: 543.76 - lr: 0.000013 - momentum: 0.000000 2023-10-11 13:25:29,801 epoch 10 - iter 81/272 - loss 0.02680971 - time (sec): 27.91 - samples/sec: 559.69 - lr: 0.000012 - momentum: 0.000000 2023-10-11 13:25:39,117 epoch 10 - iter 108/272 - loss 0.02701104 - time (sec): 37.23 - samples/sec: 562.59 - lr: 0.000010 - momentum: 0.000000 2023-10-11 13:25:48,499 epoch 10 - iter 135/272 - loss 0.03203950 - time (sec): 46.61 - samples/sec: 567.91 - lr: 0.000008 - momentum: 0.000000 2023-10-11 13:25:58,719 epoch 10 - iter 162/272 - loss 0.03547743 - time (sec): 56.83 - samples/sec: 578.80 - lr: 0.000007 - momentum: 0.000000 2023-10-11 13:26:07,029 epoch 10 - iter 189/272 - loss 0.03546825 - time (sec): 65.14 - samples/sec: 565.07 - lr: 0.000005 - momentum: 0.000000 2023-10-11 13:26:16,231 epoch 10 - iter 216/272 - loss 0.03456324 - time (sec): 74.34 - samples/sec: 563.35 - lr: 0.000003 - momentum: 0.000000 2023-10-11 13:26:25,608 epoch 10 - iter 243/272 - loss 0.03387184 - time (sec): 83.72 - samples/sec: 558.24 - lr: 0.000002 - momentum: 0.000000 2023-10-11 13:26:34,835 epoch 10 - iter 270/272 - loss 0.03411250 - time (sec): 92.95 - samples/sec: 555.26 - lr: 0.000000 - momentum: 0.000000 2023-10-11 13:26:35,403 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:26:35,403 EPOCH 10 done: loss 0.0340 - lr: 0.000000 2023-10-11 13:26:40,958 DEV : loss 0.13090862333774567 - f1-score (micro avg) 0.7847 2023-10-11 13:26:41,799 ---------------------------------------------------------------------------------------------------- 2023-10-11 13:26:41,800 Loading model from best epoch ... 2023-10-11 13:26:45,973 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-11 13:26:58,353 Results: - F-score (micro) 0.7799 - F-score (macro) 0.6981 - Accuracy 0.657 By class: precision recall f1-score support LOC 0.7977 0.8718 0.8331 312 PER 0.7061 0.8894 0.7872 208 ORG 0.4419 0.3455 0.3878 55 HumanProd 0.6897 0.9091 0.7843 22 micro avg 0.7348 0.8308 0.7799 597 macro avg 0.6588 0.7539 0.6981 597 weighted avg 0.7290 0.8308 0.7743 597 2023-10-11 13:26:58,353 ----------------------------------------------------------------------------------------------------