2023-10-13 15:47:36,877 ---------------------------------------------------------------------------------------------------- 2023-10-13 15:47:36,879 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 15:47:36,879 ---------------------------------------------------------------------------------------------------- 2023-10-13 15:47:36,879 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences - NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator 2023-10-13 15:47:36,879 ---------------------------------------------------------------------------------------------------- 2023-10-13 15:47:36,879 Train: 14465 sentences 2023-10-13 15:47:36,879 (train_with_dev=False, train_with_test=False) 2023-10-13 15:47:36,879 ---------------------------------------------------------------------------------------------------- 2023-10-13 15:47:36,879 Training Params: 2023-10-13 15:47:36,880 - learning_rate: "0.00015" 2023-10-13 15:47:36,880 - mini_batch_size: "4" 2023-10-13 15:47:36,880 - max_epochs: "10" 2023-10-13 15:47:36,880 - shuffle: "True" 2023-10-13 15:47:36,880 ---------------------------------------------------------------------------------------------------- 2023-10-13 15:47:36,880 Plugins: 2023-10-13 15:47:36,880 - TensorboardLogger 2023-10-13 15:47:36,880 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 15:47:36,880 ---------------------------------------------------------------------------------------------------- 2023-10-13 15:47:36,880 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 15:47:36,880 - metric: "('micro avg', 'f1-score')" 2023-10-13 15:47:36,880 ---------------------------------------------------------------------------------------------------- 2023-10-13 15:47:36,880 Computation: 2023-10-13 15:47:36,880 - compute on device: cuda:0 2023-10-13 15:47:36,880 - embedding storage: none 2023-10-13 15:47:36,881 ---------------------------------------------------------------------------------------------------- 2023-10-13 15:47:36,881 Model training base path: "hmbench-letemps/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2" 2023-10-13 15:47:36,881 ---------------------------------------------------------------------------------------------------- 2023-10-13 15:47:36,881 ---------------------------------------------------------------------------------------------------- 2023-10-13 15:47:36,881 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-13 15:49:16,380 epoch 1 - iter 361/3617 - loss 2.52765117 - time (sec): 99.50 - samples/sec: 375.46 - lr: 0.000015 - momentum: 0.000000 2023-10-13 15:50:55,095 epoch 1 - iter 722/3617 - loss 2.14450318 - time (sec): 198.21 - samples/sec: 377.11 - lr: 0.000030 - momentum: 0.000000 2023-10-13 15:52:34,986 epoch 1 - iter 1083/3617 - loss 1.68063315 - time (sec): 298.10 - samples/sec: 379.63 - lr: 0.000045 - momentum: 0.000000 2023-10-13 15:54:13,230 epoch 1 - iter 1444/3617 - loss 1.33922640 - time (sec): 396.35 - samples/sec: 381.28 - lr: 0.000060 - momentum: 0.000000 2023-10-13 15:55:49,352 epoch 1 - iter 1805/3617 - loss 1.11175819 - time (sec): 492.47 - samples/sec: 384.30 - lr: 0.000075 - momentum: 0.000000 2023-10-13 15:57:25,329 epoch 1 - iter 2166/3617 - loss 0.95966876 - time (sec): 588.45 - samples/sec: 385.53 - lr: 0.000090 - momentum: 0.000000 2023-10-13 15:59:05,198 epoch 1 - iter 2527/3617 - loss 0.84659547 - time (sec): 688.31 - samples/sec: 385.09 - lr: 0.000105 - momentum: 0.000000 2023-10-13 16:00:44,426 epoch 1 - iter 2888/3617 - loss 0.75756054 - time (sec): 787.54 - samples/sec: 384.55 - lr: 0.000120 - momentum: 0.000000 2023-10-13 16:02:21,853 epoch 1 - iter 3249/3617 - loss 0.68764398 - time (sec): 884.97 - samples/sec: 386.12 - lr: 0.000135 - momentum: 0.000000 2023-10-13 16:03:57,756 epoch 1 - iter 3610/3617 - loss 0.63260278 - time (sec): 980.87 - samples/sec: 386.66 - lr: 0.000150 - momentum: 0.000000 2023-10-13 16:03:59,433 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:03:59,434 EPOCH 1 done: loss 0.6317 - lr: 0.000150 2023-10-13 16:04:35,632 DEV : loss 0.1328110247850418 - f1-score (micro avg) 0.5468 2023-10-13 16:04:35,688 saving best model 2023-10-13 16:04:36,553 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:06:14,210 epoch 2 - iter 361/3617 - loss 0.11626753 - time (sec): 97.65 - samples/sec: 375.76 - lr: 0.000148 - momentum: 0.000000 2023-10-13 16:07:54,314 epoch 2 - iter 722/3617 - loss 0.11000947 - time (sec): 197.76 - samples/sec: 377.76 - lr: 0.000147 - momentum: 0.000000 2023-10-13 16:09:32,730 epoch 2 - iter 1083/3617 - loss 0.10626444 - time (sec): 296.17 - samples/sec: 381.60 - lr: 0.000145 - momentum: 0.000000 2023-10-13 16:11:10,144 epoch 2 - iter 1444/3617 - loss 0.10456416 - time (sec): 393.59 - samples/sec: 383.58 - lr: 0.000143 - momentum: 0.000000 2023-10-13 16:12:49,896 epoch 2 - iter 1805/3617 - loss 0.10240111 - time (sec): 493.34 - samples/sec: 385.22 - lr: 0.000142 - momentum: 0.000000 2023-10-13 16:14:24,736 epoch 2 - iter 2166/3617 - loss 0.10226438 - time (sec): 588.18 - samples/sec: 385.19 - lr: 0.000140 - momentum: 0.000000 2023-10-13 16:16:04,084 epoch 2 - iter 2527/3617 - loss 0.10035753 - time (sec): 687.53 - samples/sec: 384.17 - lr: 0.000138 - momentum: 0.000000 2023-10-13 16:17:46,721 epoch 2 - iter 2888/3617 - loss 0.09824782 - time (sec): 790.17 - samples/sec: 384.12 - lr: 0.000137 - momentum: 0.000000 2023-10-13 16:19:27,035 epoch 2 - iter 3249/3617 - loss 0.09643723 - time (sec): 890.48 - samples/sec: 383.53 - lr: 0.000135 - momentum: 0.000000 2023-10-13 16:21:07,446 epoch 2 - iter 3610/3617 - loss 0.09599864 - time (sec): 990.89 - samples/sec: 382.64 - lr: 0.000133 - momentum: 0.000000 2023-10-13 16:21:09,242 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:21:09,242 EPOCH 2 done: loss 0.0959 - lr: 0.000133 2023-10-13 16:21:49,050 DEV : loss 0.12169007211923599 - f1-score (micro avg) 0.5729 2023-10-13 16:21:49,110 saving best model 2023-10-13 16:21:51,717 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:23:32,625 epoch 3 - iter 361/3617 - loss 0.06045412 - time (sec): 100.90 - samples/sec: 389.18 - lr: 0.000132 - momentum: 0.000000 2023-10-13 16:25:10,123 epoch 3 - iter 722/3617 - loss 0.06228126 - time (sec): 198.40 - samples/sec: 382.78 - lr: 0.000130 - momentum: 0.000000 2023-10-13 16:26:50,214 epoch 3 - iter 1083/3617 - loss 0.06334489 - time (sec): 298.49 - samples/sec: 380.37 - lr: 0.000128 - momentum: 0.000000 2023-10-13 16:28:29,213 epoch 3 - iter 1444/3617 - loss 0.06457721 - time (sec): 397.49 - samples/sec: 380.35 - lr: 0.000127 - momentum: 0.000000 2023-10-13 16:30:10,356 epoch 3 - iter 1805/3617 - loss 0.06506915 - time (sec): 498.63 - samples/sec: 380.18 - lr: 0.000125 - momentum: 0.000000 2023-10-13 16:31:47,566 epoch 3 - iter 2166/3617 - loss 0.06594029 - time (sec): 595.84 - samples/sec: 379.51 - lr: 0.000123 - momentum: 0.000000 2023-10-13 16:33:27,197 epoch 3 - iter 2527/3617 - loss 0.06615646 - time (sec): 695.47 - samples/sec: 382.22 - lr: 0.000122 - momentum: 0.000000 2023-10-13 16:35:04,152 epoch 3 - iter 2888/3617 - loss 0.06725648 - time (sec): 792.43 - samples/sec: 381.27 - lr: 0.000120 - momentum: 0.000000 2023-10-13 16:36:41,646 epoch 3 - iter 3249/3617 - loss 0.06667975 - time (sec): 889.92 - samples/sec: 382.81 - lr: 0.000118 - momentum: 0.000000 2023-10-13 16:38:19,427 epoch 3 - iter 3610/3617 - loss 0.06625041 - time (sec): 987.70 - samples/sec: 384.07 - lr: 0.000117 - momentum: 0.000000 2023-10-13 16:38:21,051 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:38:21,051 EPOCH 3 done: loss 0.0662 - lr: 0.000117 2023-10-13 16:38:59,279 DEV : loss 0.1475251019001007 - f1-score (micro avg) 0.6326 2023-10-13 16:38:59,336 saving best model 2023-10-13 16:39:01,911 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:40:38,948 epoch 4 - iter 361/3617 - loss 0.04260643 - time (sec): 97.03 - samples/sec: 382.61 - lr: 0.000115 - momentum: 0.000000 2023-10-13 16:42:16,995 epoch 4 - iter 722/3617 - loss 0.04125262 - time (sec): 195.08 - samples/sec: 390.28 - lr: 0.000113 - momentum: 0.000000 2023-10-13 16:43:52,860 epoch 4 - iter 1083/3617 - loss 0.04631582 - time (sec): 290.94 - samples/sec: 390.67 - lr: 0.000112 - momentum: 0.000000 2023-10-13 16:45:27,629 epoch 4 - iter 1444/3617 - loss 0.04612849 - time (sec): 385.71 - samples/sec: 391.37 - lr: 0.000110 - momentum: 0.000000 2023-10-13 16:47:09,646 epoch 4 - iter 1805/3617 - loss 0.04673774 - time (sec): 487.73 - samples/sec: 387.18 - lr: 0.000108 - momentum: 0.000000 2023-10-13 16:48:49,185 epoch 4 - iter 2166/3617 - loss 0.04602262 - time (sec): 587.27 - samples/sec: 385.50 - lr: 0.000107 - momentum: 0.000000 2023-10-13 16:50:27,835 epoch 4 - iter 2527/3617 - loss 0.04628779 - time (sec): 685.92 - samples/sec: 384.91 - lr: 0.000105 - momentum: 0.000000 2023-10-13 16:52:06,093 epoch 4 - iter 2888/3617 - loss 0.04553124 - time (sec): 784.18 - samples/sec: 385.34 - lr: 0.000103 - momentum: 0.000000 2023-10-13 16:53:44,924 epoch 4 - iter 3249/3617 - loss 0.04561411 - time (sec): 883.01 - samples/sec: 386.56 - lr: 0.000102 - momentum: 0.000000 2023-10-13 16:55:22,268 epoch 4 - iter 3610/3617 - loss 0.04637074 - time (sec): 980.35 - samples/sec: 386.93 - lr: 0.000100 - momentum: 0.000000 2023-10-13 16:55:23,902 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:55:23,902 EPOCH 4 done: loss 0.0464 - lr: 0.000100 2023-10-13 16:56:03,441 DEV : loss 0.21360917389392853 - f1-score (micro avg) 0.6419 2023-10-13 16:56:03,498 saving best model 2023-10-13 16:56:06,081 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:57:41,184 epoch 5 - iter 361/3617 - loss 0.02867783 - time (sec): 95.10 - samples/sec: 407.12 - lr: 0.000098 - momentum: 0.000000 2023-10-13 16:59:18,313 epoch 5 - iter 722/3617 - loss 0.03093844 - time (sec): 192.23 - samples/sec: 403.57 - lr: 0.000097 - momentum: 0.000000 2023-10-13 17:00:54,026 epoch 5 - iter 1083/3617 - loss 0.03106379 - time (sec): 287.94 - samples/sec: 397.08 - lr: 0.000095 - momentum: 0.000000 2023-10-13 17:02:29,472 epoch 5 - iter 1444/3617 - loss 0.03435095 - time (sec): 383.39 - samples/sec: 400.62 - lr: 0.000093 - momentum: 0.000000 2023-10-13 17:04:05,072 epoch 5 - iter 1805/3617 - loss 0.03291689 - time (sec): 478.99 - samples/sec: 401.05 - lr: 0.000092 - momentum: 0.000000 2023-10-13 17:05:41,393 epoch 5 - iter 2166/3617 - loss 0.03481865 - time (sec): 575.31 - samples/sec: 396.71 - lr: 0.000090 - momentum: 0.000000 2023-10-13 17:07:19,475 epoch 5 - iter 2527/3617 - loss 0.03380849 - time (sec): 673.39 - samples/sec: 396.08 - lr: 0.000088 - momentum: 0.000000 2023-10-13 17:08:57,422 epoch 5 - iter 2888/3617 - loss 0.03376094 - time (sec): 771.34 - samples/sec: 395.27 - lr: 0.000087 - momentum: 0.000000 2023-10-13 17:10:33,716 epoch 5 - iter 3249/3617 - loss 0.03464489 - time (sec): 867.63 - samples/sec: 393.19 - lr: 0.000085 - momentum: 0.000000 2023-10-13 17:12:12,849 epoch 5 - iter 3610/3617 - loss 0.03480793 - time (sec): 966.76 - samples/sec: 392.27 - lr: 0.000083 - momentum: 0.000000 2023-10-13 17:12:14,580 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:12:14,580 EPOCH 5 done: loss 0.0349 - lr: 0.000083 2023-10-13 17:12:53,693 DEV : loss 0.2400546371936798 - f1-score (micro avg) 0.651 2023-10-13 17:12:53,753 saving best model 2023-10-13 17:12:56,347 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:14:34,985 epoch 6 - iter 361/3617 - loss 0.01720928 - time (sec): 98.63 - samples/sec: 386.11 - lr: 0.000082 - momentum: 0.000000 2023-10-13 17:16:13,842 epoch 6 - iter 722/3617 - loss 0.01844733 - time (sec): 197.49 - samples/sec: 380.67 - lr: 0.000080 - momentum: 0.000000 2023-10-13 17:17:52,649 epoch 6 - iter 1083/3617 - loss 0.01854603 - time (sec): 296.30 - samples/sec: 378.69 - lr: 0.000078 - momentum: 0.000000 2023-10-13 17:19:32,124 epoch 6 - iter 1444/3617 - loss 0.02106696 - time (sec): 395.77 - samples/sec: 380.99 - lr: 0.000077 - momentum: 0.000000 2023-10-13 17:21:08,674 epoch 6 - iter 1805/3617 - loss 0.02062784 - time (sec): 492.32 - samples/sec: 382.09 - lr: 0.000075 - momentum: 0.000000 2023-10-13 17:22:47,208 epoch 6 - iter 2166/3617 - loss 0.02025036 - time (sec): 590.86 - samples/sec: 382.21 - lr: 0.000073 - momentum: 0.000000 2023-10-13 17:24:27,370 epoch 6 - iter 2527/3617 - loss 0.02039739 - time (sec): 691.02 - samples/sec: 381.52 - lr: 0.000072 - momentum: 0.000000 2023-10-13 17:26:05,854 epoch 6 - iter 2888/3617 - loss 0.02102720 - time (sec): 789.50 - samples/sec: 383.40 - lr: 0.000070 - momentum: 0.000000 2023-10-13 17:27:44,896 epoch 6 - iter 3249/3617 - loss 0.02116438 - time (sec): 888.54 - samples/sec: 383.69 - lr: 0.000068 - momentum: 0.000000 2023-10-13 17:29:25,435 epoch 6 - iter 3610/3617 - loss 0.02233088 - time (sec): 989.08 - samples/sec: 383.39 - lr: 0.000067 - momentum: 0.000000 2023-10-13 17:29:27,292 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:29:27,292 EPOCH 6 done: loss 0.0223 - lr: 0.000067 2023-10-13 17:30:06,560 DEV : loss 0.27845296263694763 - f1-score (micro avg) 0.6351 2023-10-13 17:30:06,618 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:31:45,647 epoch 7 - iter 361/3617 - loss 0.01354530 - time (sec): 99.03 - samples/sec: 390.06 - lr: 0.000065 - momentum: 0.000000 2023-10-13 17:33:23,229 epoch 7 - iter 722/3617 - loss 0.01275613 - time (sec): 196.61 - samples/sec: 388.03 - lr: 0.000063 - momentum: 0.000000 2023-10-13 17:35:02,365 epoch 7 - iter 1083/3617 - loss 0.01402730 - time (sec): 295.74 - samples/sec: 389.84 - lr: 0.000062 - momentum: 0.000000 2023-10-13 17:36:40,537 epoch 7 - iter 1444/3617 - loss 0.01348114 - time (sec): 393.92 - samples/sec: 385.67 - lr: 0.000060 - momentum: 0.000000 2023-10-13 17:38:17,672 epoch 7 - iter 1805/3617 - loss 0.01492394 - time (sec): 491.05 - samples/sec: 386.77 - lr: 0.000058 - momentum: 0.000000 2023-10-13 17:39:55,040 epoch 7 - iter 2166/3617 - loss 0.01481483 - time (sec): 588.42 - samples/sec: 390.00 - lr: 0.000057 - momentum: 0.000000 2023-10-13 17:41:31,995 epoch 7 - iter 2527/3617 - loss 0.01489788 - time (sec): 685.38 - samples/sec: 389.85 - lr: 0.000055 - momentum: 0.000000 2023-10-13 17:43:07,557 epoch 7 - iter 2888/3617 - loss 0.01477959 - time (sec): 780.94 - samples/sec: 389.20 - lr: 0.000053 - momentum: 0.000000 2023-10-13 17:44:43,562 epoch 7 - iter 3249/3617 - loss 0.01557973 - time (sec): 876.94 - samples/sec: 388.67 - lr: 0.000052 - momentum: 0.000000 2023-10-13 17:46:22,754 epoch 7 - iter 3610/3617 - loss 0.01541438 - time (sec): 976.13 - samples/sec: 388.37 - lr: 0.000050 - momentum: 0.000000 2023-10-13 17:46:24,563 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:46:24,564 EPOCH 7 done: loss 0.0155 - lr: 0.000050 2023-10-13 17:47:02,672 DEV : loss 0.32551926374435425 - f1-score (micro avg) 0.6588 2023-10-13 17:47:02,728 saving best model 2023-10-13 17:47:05,315 ---------------------------------------------------------------------------------------------------- 2023-10-13 17:48:43,299 epoch 8 - iter 361/3617 - loss 0.01360752 - time (sec): 97.98 - samples/sec: 388.59 - lr: 0.000048 - momentum: 0.000000 2023-10-13 17:50:23,431 epoch 8 - iter 722/3617 - loss 0.01121896 - time (sec): 198.11 - samples/sec: 391.82 - lr: 0.000047 - momentum: 0.000000 2023-10-13 17:52:02,062 epoch 8 - iter 1083/3617 - loss 0.01074591 - time (sec): 296.74 - samples/sec: 391.38 - lr: 0.000045 - momentum: 0.000000 2023-10-13 17:53:40,669 epoch 8 - iter 1444/3617 - loss 0.00996253 - time (sec): 395.35 - samples/sec: 391.11 - lr: 0.000043 - momentum: 0.000000 2023-10-13 17:55:19,861 epoch 8 - iter 1805/3617 - loss 0.01086557 - time (sec): 494.54 - samples/sec: 386.03 - lr: 0.000042 - momentum: 0.000000 2023-10-13 17:57:00,564 epoch 8 - iter 2166/3617 - loss 0.01081858 - time (sec): 595.24 - samples/sec: 386.15 - lr: 0.000040 - momentum: 0.000000 2023-10-13 17:58:37,728 epoch 8 - iter 2527/3617 - loss 0.01085878 - time (sec): 692.41 - samples/sec: 385.09 - lr: 0.000038 - momentum: 0.000000 2023-10-13 18:00:14,557 epoch 8 - iter 2888/3617 - loss 0.01065470 - time (sec): 789.24 - samples/sec: 385.34 - lr: 0.000037 - momentum: 0.000000 2023-10-13 18:01:51,532 epoch 8 - iter 3249/3617 - loss 0.01090509 - time (sec): 886.21 - samples/sec: 386.25 - lr: 0.000035 - momentum: 0.000000 2023-10-13 18:03:29,711 epoch 8 - iter 3610/3617 - loss 0.01057309 - time (sec): 984.39 - samples/sec: 385.52 - lr: 0.000033 - momentum: 0.000000 2023-10-13 18:03:31,317 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:03:31,317 EPOCH 8 done: loss 0.0106 - lr: 0.000033 2023-10-13 18:04:13,133 DEV : loss 0.3422335982322693 - f1-score (micro avg) 0.6499 2023-10-13 18:04:13,198 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:05:51,892 epoch 9 - iter 361/3617 - loss 0.00746127 - time (sec): 98.69 - samples/sec: 369.13 - lr: 0.000032 - momentum: 0.000000 2023-10-13 18:07:30,684 epoch 9 - iter 722/3617 - loss 0.00693418 - time (sec): 197.48 - samples/sec: 378.82 - lr: 0.000030 - momentum: 0.000000 2023-10-13 18:09:11,794 epoch 9 - iter 1083/3617 - loss 0.00835998 - time (sec): 298.59 - samples/sec: 379.21 - lr: 0.000028 - momentum: 0.000000 2023-10-13 18:10:51,226 epoch 9 - iter 1444/3617 - loss 0.00843426 - time (sec): 398.03 - samples/sec: 379.83 - lr: 0.000027 - momentum: 0.000000 2023-10-13 18:12:31,468 epoch 9 - iter 1805/3617 - loss 0.00811795 - time (sec): 498.27 - samples/sec: 380.63 - lr: 0.000025 - momentum: 0.000000 2023-10-13 18:14:10,084 epoch 9 - iter 2166/3617 - loss 0.00779077 - time (sec): 596.88 - samples/sec: 382.22 - lr: 0.000023 - momentum: 0.000000 2023-10-13 18:15:43,914 epoch 9 - iter 2527/3617 - loss 0.00773124 - time (sec): 690.71 - samples/sec: 384.09 - lr: 0.000022 - momentum: 0.000000 2023-10-13 18:17:18,514 epoch 9 - iter 2888/3617 - loss 0.00758069 - time (sec): 785.31 - samples/sec: 383.80 - lr: 0.000020 - momentum: 0.000000 2023-10-13 18:18:53,935 epoch 9 - iter 3249/3617 - loss 0.00735282 - time (sec): 880.73 - samples/sec: 385.99 - lr: 0.000018 - momentum: 0.000000 2023-10-13 18:20:35,878 epoch 9 - iter 3610/3617 - loss 0.00724054 - time (sec): 982.68 - samples/sec: 385.88 - lr: 0.000017 - momentum: 0.000000 2023-10-13 18:20:37,660 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:20:37,660 EPOCH 9 done: loss 0.0072 - lr: 0.000017 2023-10-13 18:21:18,008 DEV : loss 0.3737005591392517 - f1-score (micro avg) 0.6637 2023-10-13 18:21:18,067 saving best model 2023-10-13 18:21:20,653 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:23:01,430 epoch 10 - iter 361/3617 - loss 0.00219566 - time (sec): 100.77 - samples/sec: 379.80 - lr: 0.000015 - momentum: 0.000000 2023-10-13 18:24:41,655 epoch 10 - iter 722/3617 - loss 0.00254285 - time (sec): 201.00 - samples/sec: 378.43 - lr: 0.000013 - momentum: 0.000000 2023-10-13 18:26:18,860 epoch 10 - iter 1083/3617 - loss 0.00509300 - time (sec): 298.20 - samples/sec: 380.51 - lr: 0.000012 - momentum: 0.000000 2023-10-13 18:27:59,093 epoch 10 - iter 1444/3617 - loss 0.00524195 - time (sec): 398.44 - samples/sec: 381.66 - lr: 0.000010 - momentum: 0.000000 2023-10-13 18:29:39,266 epoch 10 - iter 1805/3617 - loss 0.00561785 - time (sec): 498.61 - samples/sec: 379.46 - lr: 0.000008 - momentum: 0.000000 2023-10-13 18:31:16,248 epoch 10 - iter 2166/3617 - loss 0.00605238 - time (sec): 595.59 - samples/sec: 380.96 - lr: 0.000007 - momentum: 0.000000 2023-10-13 18:32:54,488 epoch 10 - iter 2527/3617 - loss 0.00590583 - time (sec): 693.83 - samples/sec: 382.92 - lr: 0.000005 - momentum: 0.000000 2023-10-13 18:34:33,504 epoch 10 - iter 2888/3617 - loss 0.00607679 - time (sec): 792.85 - samples/sec: 384.33 - lr: 0.000003 - momentum: 0.000000 2023-10-13 18:36:09,760 epoch 10 - iter 3249/3617 - loss 0.00604311 - time (sec): 889.10 - samples/sec: 383.09 - lr: 0.000002 - momentum: 0.000000 2023-10-13 18:37:48,846 epoch 10 - iter 3610/3617 - loss 0.00584381 - time (sec): 988.19 - samples/sec: 383.85 - lr: 0.000000 - momentum: 0.000000 2023-10-13 18:37:50,520 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:37:50,521 EPOCH 10 done: loss 0.0058 - lr: 0.000000 2023-10-13 18:38:31,617 DEV : loss 0.3819710314273834 - f1-score (micro avg) 0.6651 2023-10-13 18:38:31,677 saving best model 2023-10-13 18:38:35,138 ---------------------------------------------------------------------------------------------------- 2023-10-13 18:38:35,140 Loading model from best epoch ... 2023-10-13 18:38:39,133 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org 2023-10-13 18:39:36,439 Results: - F-score (micro) 0.6279 - F-score (macro) 0.4773 - Accuracy 0.4701 By class: precision recall f1-score support loc 0.6557 0.7411 0.6958 591 pers 0.5661 0.6835 0.6193 357 org 0.1200 0.1139 0.1169 79 micro avg 0.5886 0.6728 0.6279 1027 macro avg 0.4473 0.5128 0.4773 1027 weighted avg 0.5833 0.6728 0.6247 1027 2023-10-13 18:39:36,439 ----------------------------------------------------------------------------------------------------