2023-10-14 14:40:59,478 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:40:59,480 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-14 14:40:59,481 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:40:59,481 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences - NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator 2023-10-14 14:40:59,481 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:40:59,481 Train: 14465 sentences 2023-10-14 14:40:59,481 (train_with_dev=False, train_with_test=False) 2023-10-14 14:40:59,481 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:40:59,481 Training Params: 2023-10-14 14:40:59,481 - learning_rate: "0.00015" 2023-10-14 14:40:59,481 - mini_batch_size: "4" 2023-10-14 14:40:59,481 - max_epochs: "10" 2023-10-14 14:40:59,481 - shuffle: "True" 2023-10-14 14:40:59,481 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:40:59,481 Plugins: 2023-10-14 14:40:59,481 - TensorboardLogger 2023-10-14 14:40:59,482 - LinearScheduler | warmup_fraction: '0.1' 2023-10-14 14:40:59,482 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:40:59,482 Final evaluation on model from best epoch (best-model.pt) 2023-10-14 14:40:59,482 - metric: "('micro avg', 'f1-score')" 2023-10-14 14:40:59,482 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:40:59,482 Computation: 2023-10-14 14:40:59,482 - compute on device: cuda:0 2023-10-14 14:40:59,482 - embedding storage: none 2023-10-14 14:40:59,482 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:40:59,482 Model training base path: "hmbench-letemps/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-4" 2023-10-14 14:40:59,482 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:40:59,482 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:40:59,482 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-14 14:42:41,066 epoch 1 - iter 361/3617 - loss 2.50686900 - time (sec): 101.58 - samples/sec: 360.10 - lr: 0.000015 - momentum: 0.000000 2023-10-14 14:44:22,406 epoch 1 - iter 722/3617 - loss 2.10752883 - time (sec): 202.92 - samples/sec: 369.39 - lr: 0.000030 - momentum: 0.000000 2023-10-14 14:46:12,979 epoch 1 - iter 1083/3617 - loss 1.65683609 - time (sec): 313.49 - samples/sec: 363.76 - lr: 0.000045 - momentum: 0.000000 2023-10-14 14:47:57,110 epoch 1 - iter 1444/3617 - loss 1.31501097 - time (sec): 417.63 - samples/sec: 364.72 - lr: 0.000060 - momentum: 0.000000 2023-10-14 14:49:44,051 epoch 1 - iter 1805/3617 - loss 1.09620008 - time (sec): 524.57 - samples/sec: 361.95 - lr: 0.000075 - momentum: 0.000000 2023-10-14 14:51:28,614 epoch 1 - iter 2166/3617 - loss 0.93978591 - time (sec): 629.13 - samples/sec: 364.30 - lr: 0.000090 - momentum: 0.000000 2023-10-14 14:53:09,495 epoch 1 - iter 2527/3617 - loss 0.82889421 - time (sec): 730.01 - samples/sec: 366.34 - lr: 0.000105 - momentum: 0.000000 2023-10-14 14:54:50,772 epoch 1 - iter 2888/3617 - loss 0.74534435 - time (sec): 831.29 - samples/sec: 367.53 - lr: 0.000120 - momentum: 0.000000 2023-10-14 14:56:30,263 epoch 1 - iter 3249/3617 - loss 0.67952585 - time (sec): 930.78 - samples/sec: 367.24 - lr: 0.000135 - momentum: 0.000000 2023-10-14 14:58:13,235 epoch 1 - iter 3610/3617 - loss 0.62441070 - time (sec): 1033.75 - samples/sec: 366.94 - lr: 0.000150 - momentum: 0.000000 2023-10-14 14:58:14,910 ---------------------------------------------------------------------------------------------------- 2023-10-14 14:58:14,910 EPOCH 1 done: loss 0.6239 - lr: 0.000150 2023-10-14 14:58:53,380 DEV : loss 0.11717528849840164 - f1-score (micro avg) 0.5796 2023-10-14 14:58:53,438 saving best model 2023-10-14 14:58:54,356 ---------------------------------------------------------------------------------------------------- 2023-10-14 15:00:36,235 epoch 2 - iter 361/3617 - loss 0.10878939 - time (sec): 101.88 - samples/sec: 370.35 - lr: 0.000148 - momentum: 0.000000 2023-10-14 15:02:20,856 epoch 2 - iter 722/3617 - loss 0.10496323 - time (sec): 206.50 - samples/sec: 369.37 - lr: 0.000147 - momentum: 0.000000 2023-10-14 15:04:10,294 epoch 2 - iter 1083/3617 - loss 0.10512196 - time (sec): 315.94 - samples/sec: 358.08 - lr: 0.000145 - momentum: 0.000000 2023-10-14 15:05:52,218 epoch 2 - iter 1444/3617 - loss 0.10247324 - time (sec): 417.86 - samples/sec: 360.21 - lr: 0.000143 - momentum: 0.000000 2023-10-14 15:07:32,739 epoch 2 - iter 1805/3617 - loss 0.10015258 - time (sec): 518.38 - samples/sec: 362.60 - lr: 0.000142 - momentum: 0.000000 2023-10-14 15:09:12,203 epoch 2 - iter 2166/3617 - loss 0.09832530 - time (sec): 617.84 - samples/sec: 366.60 - lr: 0.000140 - momentum: 0.000000 2023-10-14 15:10:55,993 epoch 2 - iter 2527/3617 - loss 0.09600180 - time (sec): 721.63 - samples/sec: 368.59 - lr: 0.000138 - momentum: 0.000000 2023-10-14 15:12:42,577 epoch 2 - iter 2888/3617 - loss 0.09489153 - time (sec): 828.22 - samples/sec: 367.36 - lr: 0.000137 - momentum: 0.000000 2023-10-14 15:14:25,871 epoch 2 - iter 3249/3617 - loss 0.09188276 - time (sec): 931.51 - samples/sec: 367.52 - lr: 0.000135 - momentum: 0.000000 2023-10-14 15:16:09,934 epoch 2 - iter 3610/3617 - loss 0.09054350 - time (sec): 1035.58 - samples/sec: 366.37 - lr: 0.000133 - momentum: 0.000000 2023-10-14 15:16:11,806 ---------------------------------------------------------------------------------------------------- 2023-10-14 15:16:11,807 EPOCH 2 done: loss 0.0909 - lr: 0.000133 2023-10-14 15:16:51,726 DEV : loss 0.11103517562150955 - f1-score (micro avg) 0.6208 2023-10-14 15:16:51,794 saving best model 2023-10-14 15:16:57,345 ---------------------------------------------------------------------------------------------------- 2023-10-14 15:18:47,930 epoch 3 - iter 361/3617 - loss 0.06378214 - time (sec): 110.58 - samples/sec: 346.95 - lr: 0.000132 - momentum: 0.000000 2023-10-14 15:20:27,753 epoch 3 - iter 722/3617 - loss 0.06540151 - time (sec): 210.40 - samples/sec: 356.29 - lr: 0.000130 - momentum: 0.000000 2023-10-14 15:22:11,280 epoch 3 - iter 1083/3617 - loss 0.06698159 - time (sec): 313.93 - samples/sec: 359.05 - lr: 0.000128 - momentum: 0.000000 2023-10-14 15:23:52,034 epoch 3 - iter 1444/3617 - loss 0.06526504 - time (sec): 414.68 - samples/sec: 366.94 - lr: 0.000127 - momentum: 0.000000 2023-10-14 15:25:35,189 epoch 3 - iter 1805/3617 - loss 0.06525753 - time (sec): 517.84 - samples/sec: 368.07 - lr: 0.000125 - momentum: 0.000000 2023-10-14 15:27:21,741 epoch 3 - iter 2166/3617 - loss 0.06493734 - time (sec): 624.39 - samples/sec: 367.06 - lr: 0.000123 - momentum: 0.000000 2023-10-14 15:29:02,837 epoch 3 - iter 2527/3617 - loss 0.06477906 - time (sec): 725.49 - samples/sec: 368.41 - lr: 0.000122 - momentum: 0.000000 2023-10-14 15:30:43,338 epoch 3 - iter 2888/3617 - loss 0.06486032 - time (sec): 825.99 - samples/sec: 367.73 - lr: 0.000120 - momentum: 0.000000 2023-10-14 15:32:21,513 epoch 3 - iter 3249/3617 - loss 0.06429360 - time (sec): 924.16 - samples/sec: 369.65 - lr: 0.000118 - momentum: 0.000000 2023-10-14 15:34:00,097 epoch 3 - iter 3610/3617 - loss 0.06473516 - time (sec): 1022.75 - samples/sec: 370.86 - lr: 0.000117 - momentum: 0.000000 2023-10-14 15:34:01,771 ---------------------------------------------------------------------------------------------------- 2023-10-14 15:34:01,772 EPOCH 3 done: loss 0.0648 - lr: 0.000117 2023-10-14 15:34:42,878 DEV : loss 0.1630961298942566 - f1-score (micro avg) 0.6158 2023-10-14 15:34:42,948 ---------------------------------------------------------------------------------------------------- 2023-10-14 15:36:23,601 epoch 4 - iter 361/3617 - loss 0.04594123 - time (sec): 100.65 - samples/sec: 363.48 - lr: 0.000115 - momentum: 0.000000 2023-10-14 15:38:03,903 epoch 4 - iter 722/3617 - loss 0.04874699 - time (sec): 200.95 - samples/sec: 370.90 - lr: 0.000113 - momentum: 0.000000 2023-10-14 15:39:47,633 epoch 4 - iter 1083/3617 - loss 0.04891474 - time (sec): 304.68 - samples/sec: 372.99 - lr: 0.000112 - momentum: 0.000000 2023-10-14 15:41:28,223 epoch 4 - iter 1444/3617 - loss 0.04690533 - time (sec): 405.27 - samples/sec: 370.54 - lr: 0.000110 - momentum: 0.000000 2023-10-14 15:43:08,244 epoch 4 - iter 1805/3617 - loss 0.04706884 - time (sec): 505.29 - samples/sec: 371.67 - lr: 0.000108 - momentum: 0.000000 2023-10-14 15:44:53,122 epoch 4 - iter 2166/3617 - loss 0.04732382 - time (sec): 610.17 - samples/sec: 371.22 - lr: 0.000107 - momentum: 0.000000 2023-10-14 15:46:34,153 epoch 4 - iter 2527/3617 - loss 0.04755205 - time (sec): 711.20 - samples/sec: 372.43 - lr: 0.000105 - momentum: 0.000000 2023-10-14 15:48:16,775 epoch 4 - iter 2888/3617 - loss 0.04692890 - time (sec): 813.82 - samples/sec: 373.54 - lr: 0.000103 - momentum: 0.000000 2023-10-14 15:49:58,502 epoch 4 - iter 3249/3617 - loss 0.04641577 - time (sec): 915.55 - samples/sec: 372.42 - lr: 0.000102 - momentum: 0.000000 2023-10-14 15:51:37,654 epoch 4 - iter 3610/3617 - loss 0.04580765 - time (sec): 1014.70 - samples/sec: 373.81 - lr: 0.000100 - momentum: 0.000000 2023-10-14 15:51:39,377 ---------------------------------------------------------------------------------------------------- 2023-10-14 15:51:39,378 EPOCH 4 done: loss 0.0458 - lr: 0.000100 2023-10-14 15:52:18,475 DEV : loss 0.21207064390182495 - f1-score (micro avg) 0.6575 2023-10-14 15:52:18,532 saving best model 2023-10-14 15:52:21,265 ---------------------------------------------------------------------------------------------------- 2023-10-14 15:53:59,089 epoch 5 - iter 361/3617 - loss 0.02339736 - time (sec): 97.82 - samples/sec: 384.30 - lr: 0.000098 - momentum: 0.000000 2023-10-14 15:55:43,726 epoch 5 - iter 722/3617 - loss 0.02553085 - time (sec): 202.46 - samples/sec: 387.78 - lr: 0.000097 - momentum: 0.000000 2023-10-14 15:57:28,602 epoch 5 - iter 1083/3617 - loss 0.02625521 - time (sec): 307.33 - samples/sec: 381.45 - lr: 0.000095 - momentum: 0.000000 2023-10-14 15:59:07,613 epoch 5 - iter 1444/3617 - loss 0.02682998 - time (sec): 406.34 - samples/sec: 378.67 - lr: 0.000093 - momentum: 0.000000 2023-10-14 16:00:54,191 epoch 5 - iter 1805/3617 - loss 0.02777093 - time (sec): 512.92 - samples/sec: 371.65 - lr: 0.000092 - momentum: 0.000000 2023-10-14 16:02:34,896 epoch 5 - iter 2166/3617 - loss 0.02927279 - time (sec): 613.63 - samples/sec: 373.11 - lr: 0.000090 - momentum: 0.000000 2023-10-14 16:04:12,788 epoch 5 - iter 2527/3617 - loss 0.02908364 - time (sec): 711.52 - samples/sec: 374.65 - lr: 0.000088 - momentum: 0.000000 2023-10-14 16:05:50,506 epoch 5 - iter 2888/3617 - loss 0.02954096 - time (sec): 809.24 - samples/sec: 375.84 - lr: 0.000087 - momentum: 0.000000 2023-10-14 16:07:30,269 epoch 5 - iter 3249/3617 - loss 0.03004065 - time (sec): 909.00 - samples/sec: 375.39 - lr: 0.000085 - momentum: 0.000000 2023-10-14 16:09:09,000 epoch 5 - iter 3610/3617 - loss 0.03067994 - time (sec): 1007.73 - samples/sec: 376.30 - lr: 0.000083 - momentum: 0.000000 2023-10-14 16:09:10,704 ---------------------------------------------------------------------------------------------------- 2023-10-14 16:09:10,705 EPOCH 5 done: loss 0.0306 - lr: 0.000083 2023-10-14 16:09:49,418 DEV : loss 0.2788721024990082 - f1-score (micro avg) 0.6128 2023-10-14 16:09:49,475 ---------------------------------------------------------------------------------------------------- 2023-10-14 16:11:32,473 epoch 6 - iter 361/3617 - loss 0.02077887 - time (sec): 103.00 - samples/sec: 381.48 - lr: 0.000082 - momentum: 0.000000 2023-10-14 16:13:14,265 epoch 6 - iter 722/3617 - loss 0.02070222 - time (sec): 204.79 - samples/sec: 373.48 - lr: 0.000080 - momentum: 0.000000 2023-10-14 16:14:54,315 epoch 6 - iter 1083/3617 - loss 0.01942465 - time (sec): 304.84 - samples/sec: 374.09 - lr: 0.000078 - momentum: 0.000000 2023-10-14 16:16:32,568 epoch 6 - iter 1444/3617 - loss 0.02052837 - time (sec): 403.09 - samples/sec: 375.08 - lr: 0.000077 - momentum: 0.000000 2023-10-14 16:18:15,025 epoch 6 - iter 1805/3617 - loss 0.02128468 - time (sec): 505.55 - samples/sec: 372.79 - lr: 0.000075 - momentum: 0.000000 2023-10-14 16:19:57,182 epoch 6 - iter 2166/3617 - loss 0.02145412 - time (sec): 607.70 - samples/sec: 373.59 - lr: 0.000073 - momentum: 0.000000 2023-10-14 16:21:37,405 epoch 6 - iter 2527/3617 - loss 0.02127559 - time (sec): 707.93 - samples/sec: 373.85 - lr: 0.000072 - momentum: 0.000000 2023-10-14 16:23:17,748 epoch 6 - iter 2888/3617 - loss 0.02174407 - time (sec): 808.27 - samples/sec: 375.50 - lr: 0.000070 - momentum: 0.000000 2023-10-14 16:24:58,379 epoch 6 - iter 3249/3617 - loss 0.02165837 - time (sec): 908.90 - samples/sec: 375.13 - lr: 0.000068 - momentum: 0.000000 2023-10-14 16:26:39,103 epoch 6 - iter 3610/3617 - loss 0.02119253 - time (sec): 1009.63 - samples/sec: 375.80 - lr: 0.000067 - momentum: 0.000000 2023-10-14 16:26:41,074 ---------------------------------------------------------------------------------------------------- 2023-10-14 16:26:41,074 EPOCH 6 done: loss 0.0212 - lr: 0.000067 2023-10-14 16:27:20,798 DEV : loss 0.313930869102478 - f1-score (micro avg) 0.6284 2023-10-14 16:27:20,855 ---------------------------------------------------------------------------------------------------- 2023-10-14 16:29:06,678 epoch 7 - iter 361/3617 - loss 0.01422485 - time (sec): 105.82 - samples/sec: 383.75 - lr: 0.000065 - momentum: 0.000000 2023-10-14 16:30:54,879 epoch 7 - iter 722/3617 - loss 0.01422789 - time (sec): 214.02 - samples/sec: 363.13 - lr: 0.000063 - momentum: 0.000000 2023-10-14 16:32:43,802 epoch 7 - iter 1083/3617 - loss 0.01393561 - time (sec): 322.94 - samples/sec: 357.57 - lr: 0.000062 - momentum: 0.000000 2023-10-14 16:34:23,506 epoch 7 - iter 1444/3617 - loss 0.01434256 - time (sec): 422.65 - samples/sec: 361.08 - lr: 0.000060 - momentum: 0.000000 2023-10-14 16:36:06,389 epoch 7 - iter 1805/3617 - loss 0.01398391 - time (sec): 525.53 - samples/sec: 361.44 - lr: 0.000058 - momentum: 0.000000 2023-10-14 16:37:52,259 epoch 7 - iter 2166/3617 - loss 0.01422903 - time (sec): 631.40 - samples/sec: 360.88 - lr: 0.000057 - momentum: 0.000000 2023-10-14 16:39:35,485 epoch 7 - iter 2527/3617 - loss 0.01414726 - time (sec): 734.63 - samples/sec: 362.33 - lr: 0.000055 - momentum: 0.000000 2023-10-14 16:41:19,107 epoch 7 - iter 2888/3617 - loss 0.01450497 - time (sec): 838.25 - samples/sec: 364.58 - lr: 0.000053 - momentum: 0.000000 2023-10-14 16:43:02,248 epoch 7 - iter 3249/3617 - loss 0.01550406 - time (sec): 941.39 - samples/sec: 364.18 - lr: 0.000052 - momentum: 0.000000 2023-10-14 16:44:48,987 epoch 7 - iter 3610/3617 - loss 0.01571456 - time (sec): 1048.13 - samples/sec: 361.60 - lr: 0.000050 - momentum: 0.000000 2023-10-14 16:44:51,074 ---------------------------------------------------------------------------------------------------- 2023-10-14 16:44:51,074 EPOCH 7 done: loss 0.0158 - lr: 0.000050 2023-10-14 16:45:30,018 DEV : loss 0.3257623016834259 - f1-score (micro avg) 0.6263 2023-10-14 16:45:30,075 ---------------------------------------------------------------------------------------------------- 2023-10-14 16:47:09,244 epoch 8 - iter 361/3617 - loss 0.01142318 - time (sec): 99.17 - samples/sec: 387.29 - lr: 0.000048 - momentum: 0.000000 2023-10-14 16:48:57,539 epoch 8 - iter 722/3617 - loss 0.01106006 - time (sec): 207.46 - samples/sec: 374.10 - lr: 0.000047 - momentum: 0.000000 2023-10-14 16:50:47,572 epoch 8 - iter 1083/3617 - loss 0.01148474 - time (sec): 317.49 - samples/sec: 363.91 - lr: 0.000045 - momentum: 0.000000 2023-10-14 16:52:29,277 epoch 8 - iter 1444/3617 - loss 0.01109820 - time (sec): 419.20 - samples/sec: 363.08 - lr: 0.000043 - momentum: 0.000000 2023-10-14 16:54:07,215 epoch 8 - iter 1805/3617 - loss 0.01060742 - time (sec): 517.14 - samples/sec: 368.74 - lr: 0.000042 - momentum: 0.000000 2023-10-14 16:55:45,728 epoch 8 - iter 2166/3617 - loss 0.01013602 - time (sec): 615.65 - samples/sec: 368.66 - lr: 0.000040 - momentum: 0.000000 2023-10-14 16:57:28,600 epoch 8 - iter 2527/3617 - loss 0.01003184 - time (sec): 718.52 - samples/sec: 370.30 - lr: 0.000038 - momentum: 0.000000 2023-10-14 16:59:08,971 epoch 8 - iter 2888/3617 - loss 0.01030870 - time (sec): 818.89 - samples/sec: 370.18 - lr: 0.000037 - momentum: 0.000000 2023-10-14 17:00:48,363 epoch 8 - iter 3249/3617 - loss 0.01013006 - time (sec): 918.29 - samples/sec: 371.85 - lr: 0.000035 - momentum: 0.000000 2023-10-14 17:02:27,074 epoch 8 - iter 3610/3617 - loss 0.00973135 - time (sec): 1017.00 - samples/sec: 372.91 - lr: 0.000033 - momentum: 0.000000 2023-10-14 17:02:28,759 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:02:28,759 EPOCH 8 done: loss 0.0097 - lr: 0.000033 2023-10-14 17:03:08,171 DEV : loss 0.3519401252269745 - f1-score (micro avg) 0.6383 2023-10-14 17:03:08,238 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:04:56,121 epoch 9 - iter 361/3617 - loss 0.00400351 - time (sec): 107.88 - samples/sec: 337.43 - lr: 0.000032 - momentum: 0.000000 2023-10-14 17:06:44,549 epoch 9 - iter 722/3617 - loss 0.00553294 - time (sec): 216.31 - samples/sec: 337.08 - lr: 0.000030 - momentum: 0.000000 2023-10-14 17:08:26,029 epoch 9 - iter 1083/3617 - loss 0.00718701 - time (sec): 317.79 - samples/sec: 350.01 - lr: 0.000028 - momentum: 0.000000 2023-10-14 17:10:14,915 epoch 9 - iter 1444/3617 - loss 0.00747014 - time (sec): 426.67 - samples/sec: 351.11 - lr: 0.000027 - momentum: 0.000000 2023-10-14 17:12:13,127 epoch 9 - iter 1805/3617 - loss 0.00700743 - time (sec): 544.89 - samples/sec: 346.36 - lr: 0.000025 - momentum: 0.000000 2023-10-14 17:13:56,710 epoch 9 - iter 2166/3617 - loss 0.00753126 - time (sec): 648.47 - samples/sec: 348.52 - lr: 0.000023 - momentum: 0.000000 2023-10-14 17:15:35,725 epoch 9 - iter 2527/3617 - loss 0.00713809 - time (sec): 747.48 - samples/sec: 353.09 - lr: 0.000022 - momentum: 0.000000 2023-10-14 17:17:15,835 epoch 9 - iter 2888/3617 - loss 0.00710439 - time (sec): 847.60 - samples/sec: 358.03 - lr: 0.000020 - momentum: 0.000000 2023-10-14 17:18:53,848 epoch 9 - iter 3249/3617 - loss 0.00691053 - time (sec): 945.61 - samples/sec: 361.20 - lr: 0.000018 - momentum: 0.000000 2023-10-14 17:20:32,410 epoch 9 - iter 3610/3617 - loss 0.00723383 - time (sec): 1044.17 - samples/sec: 363.12 - lr: 0.000017 - momentum: 0.000000 2023-10-14 17:20:34,181 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:20:34,181 EPOCH 9 done: loss 0.0072 - lr: 0.000017 2023-10-14 17:21:13,627 DEV : loss 0.37418004870414734 - f1-score (micro avg) 0.6425 2023-10-14 17:21:13,685 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:22:51,217 epoch 10 - iter 361/3617 - loss 0.00288085 - time (sec): 97.53 - samples/sec: 383.84 - lr: 0.000015 - momentum: 0.000000 2023-10-14 17:24:32,451 epoch 10 - iter 722/3617 - loss 0.00430136 - time (sec): 198.76 - samples/sec: 381.28 - lr: 0.000013 - momentum: 0.000000 2023-10-14 17:26:18,156 epoch 10 - iter 1083/3617 - loss 0.00516123 - time (sec): 304.47 - samples/sec: 376.22 - lr: 0.000012 - momentum: 0.000000 2023-10-14 17:27:59,531 epoch 10 - iter 1444/3617 - loss 0.00455380 - time (sec): 405.84 - samples/sec: 374.69 - lr: 0.000010 - momentum: 0.000000 2023-10-14 17:29:41,578 epoch 10 - iter 1805/3617 - loss 0.00417121 - time (sec): 507.89 - samples/sec: 373.65 - lr: 0.000008 - momentum: 0.000000 2023-10-14 17:31:24,980 epoch 10 - iter 2166/3617 - loss 0.00427900 - time (sec): 611.29 - samples/sec: 373.25 - lr: 0.000007 - momentum: 0.000000 2023-10-14 17:33:04,407 epoch 10 - iter 2527/3617 - loss 0.00423939 - time (sec): 710.72 - samples/sec: 374.13 - lr: 0.000005 - momentum: 0.000000 2023-10-14 17:34:44,494 epoch 10 - iter 2888/3617 - loss 0.00423096 - time (sec): 810.81 - samples/sec: 376.32 - lr: 0.000003 - momentum: 0.000000 2023-10-14 17:36:23,538 epoch 10 - iter 3249/3617 - loss 0.00456365 - time (sec): 909.85 - samples/sec: 376.25 - lr: 0.000002 - momentum: 0.000000 2023-10-14 17:38:04,514 epoch 10 - iter 3610/3617 - loss 0.00445018 - time (sec): 1010.83 - samples/sec: 375.28 - lr: 0.000000 - momentum: 0.000000 2023-10-14 17:38:06,359 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:38:06,359 EPOCH 10 done: loss 0.0045 - lr: 0.000000 2023-10-14 17:38:48,489 DEV : loss 0.383007675409317 - f1-score (micro avg) 0.6403 2023-10-14 17:38:49,482 ---------------------------------------------------------------------------------------------------- 2023-10-14 17:38:49,484 Loading model from best epoch ... 2023-10-14 17:38:53,352 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org 2023-10-14 17:39:53,980 Results: - F-score (micro) 0.6356 - F-score (macro) 0.4981 - Accuracy 0.4788 By class: precision recall f1-score support loc 0.6276 0.7699 0.6915 591 pers 0.5664 0.7171 0.6329 357 org 0.1757 0.1646 0.1699 79 micro avg 0.5787 0.7050 0.6356 1027 macro avg 0.4565 0.5505 0.4981 1027 weighted avg 0.5715 0.7050 0.6310 1027 2023-10-14 17:39:53,980 ----------------------------------------------------------------------------------------------------