|
2023-10-13 15:47:36,877 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 15:47:36,879 Model: "SequenceTagger( |
|
(embeddings): ByT5Embeddings( |
|
(model): T5EncoderModel( |
|
(shared): Embedding(384, 1472) |
|
(encoder): T5Stack( |
|
(embed_tokens): Embedding(384, 1472) |
|
(block): ModuleList( |
|
(0): T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
(relative_attention_bias): Embedding(32, 6) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(1-11): 11 x T5Block( |
|
(layer): ModuleList( |
|
(0): T5LayerSelfAttention( |
|
(SelfAttention): T5Attention( |
|
(q): Linear(in_features=1472, out_features=384, bias=False) |
|
(k): Linear(in_features=1472, out_features=384, bias=False) |
|
(v): Linear(in_features=1472, out_features=384, bias=False) |
|
(o): Linear(in_features=384, out_features=1472, bias=False) |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
(1): T5LayerFF( |
|
(DenseReluDense): T5DenseGatedActDense( |
|
(wi_0): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wi_1): Linear(in_features=1472, out_features=3584, bias=False) |
|
(wo): Linear(in_features=3584, out_features=1472, bias=False) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
(act): NewGELUActivation() |
|
) |
|
(layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
) |
|
(final_layer_norm): FusedRMSNorm(torch.Size([1472]), eps=1e-06, elementwise_affine=True) |
|
(dropout): Dropout(p=0.1, inplace=False) |
|
) |
|
) |
|
) |
|
(locked_dropout): LockedDropout(p=0.5) |
|
(linear): Linear(in_features=1472, out_features=13, bias=True) |
|
(loss_function): CrossEntropyLoss() |
|
)" |
|
2023-10-13 15:47:36,879 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 15:47:36,879 MultiCorpus: 14465 train + 1392 dev + 2432 test sentences |
|
- NER_HIPE_2022 Corpus: 14465 train + 1392 dev + 2432 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/letemps/fr/with_doc_seperator |
|
2023-10-13 15:47:36,879 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 15:47:36,879 Train: 14465 sentences |
|
2023-10-13 15:47:36,879 (train_with_dev=False, train_with_test=False) |
|
2023-10-13 15:47:36,879 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 15:47:36,879 Training Params: |
|
2023-10-13 15:47:36,880 - learning_rate: "0.00015" |
|
2023-10-13 15:47:36,880 - mini_batch_size: "4" |
|
2023-10-13 15:47:36,880 - max_epochs: "10" |
|
2023-10-13 15:47:36,880 - shuffle: "True" |
|
2023-10-13 15:47:36,880 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 15:47:36,880 Plugins: |
|
2023-10-13 15:47:36,880 - TensorboardLogger |
|
2023-10-13 15:47:36,880 - LinearScheduler | warmup_fraction: '0.1' |
|
2023-10-13 15:47:36,880 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 15:47:36,880 Final evaluation on model from best epoch (best-model.pt) |
|
2023-10-13 15:47:36,880 - metric: "('micro avg', 'f1-score')" |
|
2023-10-13 15:47:36,880 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 15:47:36,880 Computation: |
|
2023-10-13 15:47:36,880 - compute on device: cuda:0 |
|
2023-10-13 15:47:36,880 - embedding storage: none |
|
2023-10-13 15:47:36,881 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 15:47:36,881 Model training base path: "hmbench-letemps/fr-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs4-wsFalse-e10-lr0.00015-poolingfirst-layers-1-crfFalse-2" |
|
2023-10-13 15:47:36,881 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 15:47:36,881 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 15:47:36,881 Logging anything other than scalars to TensorBoard is currently not supported. |
|
2023-10-13 15:49:16,380 epoch 1 - iter 361/3617 - loss 2.52765117 - time (sec): 99.50 - samples/sec: 375.46 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-13 15:50:55,095 epoch 1 - iter 722/3617 - loss 2.14450318 - time (sec): 198.21 - samples/sec: 377.11 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-13 15:52:34,986 epoch 1 - iter 1083/3617 - loss 1.68063315 - time (sec): 298.10 - samples/sec: 379.63 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-13 15:54:13,230 epoch 1 - iter 1444/3617 - loss 1.33922640 - time (sec): 396.35 - samples/sec: 381.28 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-13 15:55:49,352 epoch 1 - iter 1805/3617 - loss 1.11175819 - time (sec): 492.47 - samples/sec: 384.30 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-13 15:57:25,329 epoch 1 - iter 2166/3617 - loss 0.95966876 - time (sec): 588.45 - samples/sec: 385.53 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-13 15:59:05,198 epoch 1 - iter 2527/3617 - loss 0.84659547 - time (sec): 688.31 - samples/sec: 385.09 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-13 16:00:44,426 epoch 1 - iter 2888/3617 - loss 0.75756054 - time (sec): 787.54 - samples/sec: 384.55 - lr: 0.000120 - momentum: 0.000000 |
|
2023-10-13 16:02:21,853 epoch 1 - iter 3249/3617 - loss 0.68764398 - time (sec): 884.97 - samples/sec: 386.12 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-13 16:03:57,756 epoch 1 - iter 3610/3617 - loss 0.63260278 - time (sec): 980.87 - samples/sec: 386.66 - lr: 0.000150 - momentum: 0.000000 |
|
2023-10-13 16:03:59,433 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 16:03:59,434 EPOCH 1 done: loss 0.6317 - lr: 0.000150 |
|
2023-10-13 16:04:35,632 DEV : loss 0.1328110247850418 - f1-score (micro avg) 0.5468 |
|
2023-10-13 16:04:35,688 saving best model |
|
2023-10-13 16:04:36,553 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 16:06:14,210 epoch 2 - iter 361/3617 - loss 0.11626753 - time (sec): 97.65 - samples/sec: 375.76 - lr: 0.000148 - momentum: 0.000000 |
|
2023-10-13 16:07:54,314 epoch 2 - iter 722/3617 - loss 0.11000947 - time (sec): 197.76 - samples/sec: 377.76 - lr: 0.000147 - momentum: 0.000000 |
|
2023-10-13 16:09:32,730 epoch 2 - iter 1083/3617 - loss 0.10626444 - time (sec): 296.17 - samples/sec: 381.60 - lr: 0.000145 - momentum: 0.000000 |
|
2023-10-13 16:11:10,144 epoch 2 - iter 1444/3617 - loss 0.10456416 - time (sec): 393.59 - samples/sec: 383.58 - lr: 0.000143 - momentum: 0.000000 |
|
2023-10-13 16:12:49,896 epoch 2 - iter 1805/3617 - loss 0.10240111 - time (sec): 493.34 - samples/sec: 385.22 - lr: 0.000142 - momentum: 0.000000 |
|
2023-10-13 16:14:24,736 epoch 2 - iter 2166/3617 - loss 0.10226438 - time (sec): 588.18 - samples/sec: 385.19 - lr: 0.000140 - momentum: 0.000000 |
|
2023-10-13 16:16:04,084 epoch 2 - iter 2527/3617 - loss 0.10035753 - time (sec): 687.53 - samples/sec: 384.17 - lr: 0.000138 - momentum: 0.000000 |
|
2023-10-13 16:17:46,721 epoch 2 - iter 2888/3617 - loss 0.09824782 - time (sec): 790.17 - samples/sec: 384.12 - lr: 0.000137 - momentum: 0.000000 |
|
2023-10-13 16:19:27,035 epoch 2 - iter 3249/3617 - loss 0.09643723 - time (sec): 890.48 - samples/sec: 383.53 - lr: 0.000135 - momentum: 0.000000 |
|
2023-10-13 16:21:07,446 epoch 2 - iter 3610/3617 - loss 0.09599864 - time (sec): 990.89 - samples/sec: 382.64 - lr: 0.000133 - momentum: 0.000000 |
|
2023-10-13 16:21:09,242 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 16:21:09,242 EPOCH 2 done: loss 0.0959 - lr: 0.000133 |
|
2023-10-13 16:21:49,050 DEV : loss 0.12169007211923599 - f1-score (micro avg) 0.5729 |
|
2023-10-13 16:21:49,110 saving best model |
|
2023-10-13 16:21:51,717 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 16:23:32,625 epoch 3 - iter 361/3617 - loss 0.06045412 - time (sec): 100.90 - samples/sec: 389.18 - lr: 0.000132 - momentum: 0.000000 |
|
2023-10-13 16:25:10,123 epoch 3 - iter 722/3617 - loss 0.06228126 - time (sec): 198.40 - samples/sec: 382.78 - lr: 0.000130 - momentum: 0.000000 |
|
2023-10-13 16:26:50,214 epoch 3 - iter 1083/3617 - loss 0.06334489 - time (sec): 298.49 - samples/sec: 380.37 - lr: 0.000128 - momentum: 0.000000 |
|
2023-10-13 16:28:29,213 epoch 3 - iter 1444/3617 - loss 0.06457721 - time (sec): 397.49 - samples/sec: 380.35 - lr: 0.000127 - momentum: 0.000000 |
|
2023-10-13 16:30:10,356 epoch 3 - iter 1805/3617 - loss 0.06506915 - time (sec): 498.63 - samples/sec: 380.18 - lr: 0.000125 - momentum: 0.000000 |
|
2023-10-13 16:31:47,566 epoch 3 - iter 2166/3617 - loss 0.06594029 - time (sec): 595.84 - samples/sec: 379.51 - lr: 0.000123 - momentum: 0.000000 |
|
2023-10-13 16:33:27,197 epoch 3 - iter 2527/3617 - loss 0.06615646 - time (sec): 695.47 - samples/sec: 382.22 - lr: 0.000122 - momentum: 0.000000 |
|
2023-10-13 16:35:04,152 epoch 3 - iter 2888/3617 - loss 0.06725648 - time (sec): 792.43 - samples/sec: 381.27 - lr: 0.000120 - momentum: 0.000000 |
|
2023-10-13 16:36:41,646 epoch 3 - iter 3249/3617 - loss 0.06667975 - time (sec): 889.92 - samples/sec: 382.81 - lr: 0.000118 - momentum: 0.000000 |
|
2023-10-13 16:38:19,427 epoch 3 - iter 3610/3617 - loss 0.06625041 - time (sec): 987.70 - samples/sec: 384.07 - lr: 0.000117 - momentum: 0.000000 |
|
2023-10-13 16:38:21,051 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 16:38:21,051 EPOCH 3 done: loss 0.0662 - lr: 0.000117 |
|
2023-10-13 16:38:59,279 DEV : loss 0.1475251019001007 - f1-score (micro avg) 0.6326 |
|
2023-10-13 16:38:59,336 saving best model |
|
2023-10-13 16:39:01,911 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 16:40:38,948 epoch 4 - iter 361/3617 - loss 0.04260643 - time (sec): 97.03 - samples/sec: 382.61 - lr: 0.000115 - momentum: 0.000000 |
|
2023-10-13 16:42:16,995 epoch 4 - iter 722/3617 - loss 0.04125262 - time (sec): 195.08 - samples/sec: 390.28 - lr: 0.000113 - momentum: 0.000000 |
|
2023-10-13 16:43:52,860 epoch 4 - iter 1083/3617 - loss 0.04631582 - time (sec): 290.94 - samples/sec: 390.67 - lr: 0.000112 - momentum: 0.000000 |
|
2023-10-13 16:45:27,629 epoch 4 - iter 1444/3617 - loss 0.04612849 - time (sec): 385.71 - samples/sec: 391.37 - lr: 0.000110 - momentum: 0.000000 |
|
2023-10-13 16:47:09,646 epoch 4 - iter 1805/3617 - loss 0.04673774 - time (sec): 487.73 - samples/sec: 387.18 - lr: 0.000108 - momentum: 0.000000 |
|
2023-10-13 16:48:49,185 epoch 4 - iter 2166/3617 - loss 0.04602262 - time (sec): 587.27 - samples/sec: 385.50 - lr: 0.000107 - momentum: 0.000000 |
|
2023-10-13 16:50:27,835 epoch 4 - iter 2527/3617 - loss 0.04628779 - time (sec): 685.92 - samples/sec: 384.91 - lr: 0.000105 - momentum: 0.000000 |
|
2023-10-13 16:52:06,093 epoch 4 - iter 2888/3617 - loss 0.04553124 - time (sec): 784.18 - samples/sec: 385.34 - lr: 0.000103 - momentum: 0.000000 |
|
2023-10-13 16:53:44,924 epoch 4 - iter 3249/3617 - loss 0.04561411 - time (sec): 883.01 - samples/sec: 386.56 - lr: 0.000102 - momentum: 0.000000 |
|
2023-10-13 16:55:22,268 epoch 4 - iter 3610/3617 - loss 0.04637074 - time (sec): 980.35 - samples/sec: 386.93 - lr: 0.000100 - momentum: 0.000000 |
|
2023-10-13 16:55:23,902 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 16:55:23,902 EPOCH 4 done: loss 0.0464 - lr: 0.000100 |
|
2023-10-13 16:56:03,441 DEV : loss 0.21360917389392853 - f1-score (micro avg) 0.6419 |
|
2023-10-13 16:56:03,498 saving best model |
|
2023-10-13 16:56:06,081 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 16:57:41,184 epoch 5 - iter 361/3617 - loss 0.02867783 - time (sec): 95.10 - samples/sec: 407.12 - lr: 0.000098 - momentum: 0.000000 |
|
2023-10-13 16:59:18,313 epoch 5 - iter 722/3617 - loss 0.03093844 - time (sec): 192.23 - samples/sec: 403.57 - lr: 0.000097 - momentum: 0.000000 |
|
2023-10-13 17:00:54,026 epoch 5 - iter 1083/3617 - loss 0.03106379 - time (sec): 287.94 - samples/sec: 397.08 - lr: 0.000095 - momentum: 0.000000 |
|
2023-10-13 17:02:29,472 epoch 5 - iter 1444/3617 - loss 0.03435095 - time (sec): 383.39 - samples/sec: 400.62 - lr: 0.000093 - momentum: 0.000000 |
|
2023-10-13 17:04:05,072 epoch 5 - iter 1805/3617 - loss 0.03291689 - time (sec): 478.99 - samples/sec: 401.05 - lr: 0.000092 - momentum: 0.000000 |
|
2023-10-13 17:05:41,393 epoch 5 - iter 2166/3617 - loss 0.03481865 - time (sec): 575.31 - samples/sec: 396.71 - lr: 0.000090 - momentum: 0.000000 |
|
2023-10-13 17:07:19,475 epoch 5 - iter 2527/3617 - loss 0.03380849 - time (sec): 673.39 - samples/sec: 396.08 - lr: 0.000088 - momentum: 0.000000 |
|
2023-10-13 17:08:57,422 epoch 5 - iter 2888/3617 - loss 0.03376094 - time (sec): 771.34 - samples/sec: 395.27 - lr: 0.000087 - momentum: 0.000000 |
|
2023-10-13 17:10:33,716 epoch 5 - iter 3249/3617 - loss 0.03464489 - time (sec): 867.63 - samples/sec: 393.19 - lr: 0.000085 - momentum: 0.000000 |
|
2023-10-13 17:12:12,849 epoch 5 - iter 3610/3617 - loss 0.03480793 - time (sec): 966.76 - samples/sec: 392.27 - lr: 0.000083 - momentum: 0.000000 |
|
2023-10-13 17:12:14,580 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 17:12:14,580 EPOCH 5 done: loss 0.0349 - lr: 0.000083 |
|
2023-10-13 17:12:53,693 DEV : loss 0.2400546371936798 - f1-score (micro avg) 0.651 |
|
2023-10-13 17:12:53,753 saving best model |
|
2023-10-13 17:12:56,347 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 17:14:34,985 epoch 6 - iter 361/3617 - loss 0.01720928 - time (sec): 98.63 - samples/sec: 386.11 - lr: 0.000082 - momentum: 0.000000 |
|
2023-10-13 17:16:13,842 epoch 6 - iter 722/3617 - loss 0.01844733 - time (sec): 197.49 - samples/sec: 380.67 - lr: 0.000080 - momentum: 0.000000 |
|
2023-10-13 17:17:52,649 epoch 6 - iter 1083/3617 - loss 0.01854603 - time (sec): 296.30 - samples/sec: 378.69 - lr: 0.000078 - momentum: 0.000000 |
|
2023-10-13 17:19:32,124 epoch 6 - iter 1444/3617 - loss 0.02106696 - time (sec): 395.77 - samples/sec: 380.99 - lr: 0.000077 - momentum: 0.000000 |
|
2023-10-13 17:21:08,674 epoch 6 - iter 1805/3617 - loss 0.02062784 - time (sec): 492.32 - samples/sec: 382.09 - lr: 0.000075 - momentum: 0.000000 |
|
2023-10-13 17:22:47,208 epoch 6 - iter 2166/3617 - loss 0.02025036 - time (sec): 590.86 - samples/sec: 382.21 - lr: 0.000073 - momentum: 0.000000 |
|
2023-10-13 17:24:27,370 epoch 6 - iter 2527/3617 - loss 0.02039739 - time (sec): 691.02 - samples/sec: 381.52 - lr: 0.000072 - momentum: 0.000000 |
|
2023-10-13 17:26:05,854 epoch 6 - iter 2888/3617 - loss 0.02102720 - time (sec): 789.50 - samples/sec: 383.40 - lr: 0.000070 - momentum: 0.000000 |
|
2023-10-13 17:27:44,896 epoch 6 - iter 3249/3617 - loss 0.02116438 - time (sec): 888.54 - samples/sec: 383.69 - lr: 0.000068 - momentum: 0.000000 |
|
2023-10-13 17:29:25,435 epoch 6 - iter 3610/3617 - loss 0.02233088 - time (sec): 989.08 - samples/sec: 383.39 - lr: 0.000067 - momentum: 0.000000 |
|
2023-10-13 17:29:27,292 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 17:29:27,292 EPOCH 6 done: loss 0.0223 - lr: 0.000067 |
|
2023-10-13 17:30:06,560 DEV : loss 0.27845296263694763 - f1-score (micro avg) 0.6351 |
|
2023-10-13 17:30:06,618 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 17:31:45,647 epoch 7 - iter 361/3617 - loss 0.01354530 - time (sec): 99.03 - samples/sec: 390.06 - lr: 0.000065 - momentum: 0.000000 |
|
2023-10-13 17:33:23,229 epoch 7 - iter 722/3617 - loss 0.01275613 - time (sec): 196.61 - samples/sec: 388.03 - lr: 0.000063 - momentum: 0.000000 |
|
2023-10-13 17:35:02,365 epoch 7 - iter 1083/3617 - loss 0.01402730 - time (sec): 295.74 - samples/sec: 389.84 - lr: 0.000062 - momentum: 0.000000 |
|
2023-10-13 17:36:40,537 epoch 7 - iter 1444/3617 - loss 0.01348114 - time (sec): 393.92 - samples/sec: 385.67 - lr: 0.000060 - momentum: 0.000000 |
|
2023-10-13 17:38:17,672 epoch 7 - iter 1805/3617 - loss 0.01492394 - time (sec): 491.05 - samples/sec: 386.77 - lr: 0.000058 - momentum: 0.000000 |
|
2023-10-13 17:39:55,040 epoch 7 - iter 2166/3617 - loss 0.01481483 - time (sec): 588.42 - samples/sec: 390.00 - lr: 0.000057 - momentum: 0.000000 |
|
2023-10-13 17:41:31,995 epoch 7 - iter 2527/3617 - loss 0.01489788 - time (sec): 685.38 - samples/sec: 389.85 - lr: 0.000055 - momentum: 0.000000 |
|
2023-10-13 17:43:07,557 epoch 7 - iter 2888/3617 - loss 0.01477959 - time (sec): 780.94 - samples/sec: 389.20 - lr: 0.000053 - momentum: 0.000000 |
|
2023-10-13 17:44:43,562 epoch 7 - iter 3249/3617 - loss 0.01557973 - time (sec): 876.94 - samples/sec: 388.67 - lr: 0.000052 - momentum: 0.000000 |
|
2023-10-13 17:46:22,754 epoch 7 - iter 3610/3617 - loss 0.01541438 - time (sec): 976.13 - samples/sec: 388.37 - lr: 0.000050 - momentum: 0.000000 |
|
2023-10-13 17:46:24,563 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 17:46:24,564 EPOCH 7 done: loss 0.0155 - lr: 0.000050 |
|
2023-10-13 17:47:02,672 DEV : loss 0.32551926374435425 - f1-score (micro avg) 0.6588 |
|
2023-10-13 17:47:02,728 saving best model |
|
2023-10-13 17:47:05,315 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 17:48:43,299 epoch 8 - iter 361/3617 - loss 0.01360752 - time (sec): 97.98 - samples/sec: 388.59 - lr: 0.000048 - momentum: 0.000000 |
|
2023-10-13 17:50:23,431 epoch 8 - iter 722/3617 - loss 0.01121896 - time (sec): 198.11 - samples/sec: 391.82 - lr: 0.000047 - momentum: 0.000000 |
|
2023-10-13 17:52:02,062 epoch 8 - iter 1083/3617 - loss 0.01074591 - time (sec): 296.74 - samples/sec: 391.38 - lr: 0.000045 - momentum: 0.000000 |
|
2023-10-13 17:53:40,669 epoch 8 - iter 1444/3617 - loss 0.00996253 - time (sec): 395.35 - samples/sec: 391.11 - lr: 0.000043 - momentum: 0.000000 |
|
2023-10-13 17:55:19,861 epoch 8 - iter 1805/3617 - loss 0.01086557 - time (sec): 494.54 - samples/sec: 386.03 - lr: 0.000042 - momentum: 0.000000 |
|
2023-10-13 17:57:00,564 epoch 8 - iter 2166/3617 - loss 0.01081858 - time (sec): 595.24 - samples/sec: 386.15 - lr: 0.000040 - momentum: 0.000000 |
|
2023-10-13 17:58:37,728 epoch 8 - iter 2527/3617 - loss 0.01085878 - time (sec): 692.41 - samples/sec: 385.09 - lr: 0.000038 - momentum: 0.000000 |
|
2023-10-13 18:00:14,557 epoch 8 - iter 2888/3617 - loss 0.01065470 - time (sec): 789.24 - samples/sec: 385.34 - lr: 0.000037 - momentum: 0.000000 |
|
2023-10-13 18:01:51,532 epoch 8 - iter 3249/3617 - loss 0.01090509 - time (sec): 886.21 - samples/sec: 386.25 - lr: 0.000035 - momentum: 0.000000 |
|
2023-10-13 18:03:29,711 epoch 8 - iter 3610/3617 - loss 0.01057309 - time (sec): 984.39 - samples/sec: 385.52 - lr: 0.000033 - momentum: 0.000000 |
|
2023-10-13 18:03:31,317 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:03:31,317 EPOCH 8 done: loss 0.0106 - lr: 0.000033 |
|
2023-10-13 18:04:13,133 DEV : loss 0.3422335982322693 - f1-score (micro avg) 0.6499 |
|
2023-10-13 18:04:13,198 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:05:51,892 epoch 9 - iter 361/3617 - loss 0.00746127 - time (sec): 98.69 - samples/sec: 369.13 - lr: 0.000032 - momentum: 0.000000 |
|
2023-10-13 18:07:30,684 epoch 9 - iter 722/3617 - loss 0.00693418 - time (sec): 197.48 - samples/sec: 378.82 - lr: 0.000030 - momentum: 0.000000 |
|
2023-10-13 18:09:11,794 epoch 9 - iter 1083/3617 - loss 0.00835998 - time (sec): 298.59 - samples/sec: 379.21 - lr: 0.000028 - momentum: 0.000000 |
|
2023-10-13 18:10:51,226 epoch 9 - iter 1444/3617 - loss 0.00843426 - time (sec): 398.03 - samples/sec: 379.83 - lr: 0.000027 - momentum: 0.000000 |
|
2023-10-13 18:12:31,468 epoch 9 - iter 1805/3617 - loss 0.00811795 - time (sec): 498.27 - samples/sec: 380.63 - lr: 0.000025 - momentum: 0.000000 |
|
2023-10-13 18:14:10,084 epoch 9 - iter 2166/3617 - loss 0.00779077 - time (sec): 596.88 - samples/sec: 382.22 - lr: 0.000023 - momentum: 0.000000 |
|
2023-10-13 18:15:43,914 epoch 9 - iter 2527/3617 - loss 0.00773124 - time (sec): 690.71 - samples/sec: 384.09 - lr: 0.000022 - momentum: 0.000000 |
|
2023-10-13 18:17:18,514 epoch 9 - iter 2888/3617 - loss 0.00758069 - time (sec): 785.31 - samples/sec: 383.80 - lr: 0.000020 - momentum: 0.000000 |
|
2023-10-13 18:18:53,935 epoch 9 - iter 3249/3617 - loss 0.00735282 - time (sec): 880.73 - samples/sec: 385.99 - lr: 0.000018 - momentum: 0.000000 |
|
2023-10-13 18:20:35,878 epoch 9 - iter 3610/3617 - loss 0.00724054 - time (sec): 982.68 - samples/sec: 385.88 - lr: 0.000017 - momentum: 0.000000 |
|
2023-10-13 18:20:37,660 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:20:37,660 EPOCH 9 done: loss 0.0072 - lr: 0.000017 |
|
2023-10-13 18:21:18,008 DEV : loss 0.3737005591392517 - f1-score (micro avg) 0.6637 |
|
2023-10-13 18:21:18,067 saving best model |
|
2023-10-13 18:21:20,653 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:23:01,430 epoch 10 - iter 361/3617 - loss 0.00219566 - time (sec): 100.77 - samples/sec: 379.80 - lr: 0.000015 - momentum: 0.000000 |
|
2023-10-13 18:24:41,655 epoch 10 - iter 722/3617 - loss 0.00254285 - time (sec): 201.00 - samples/sec: 378.43 - lr: 0.000013 - momentum: 0.000000 |
|
2023-10-13 18:26:18,860 epoch 10 - iter 1083/3617 - loss 0.00509300 - time (sec): 298.20 - samples/sec: 380.51 - lr: 0.000012 - momentum: 0.000000 |
|
2023-10-13 18:27:59,093 epoch 10 - iter 1444/3617 - loss 0.00524195 - time (sec): 398.44 - samples/sec: 381.66 - lr: 0.000010 - momentum: 0.000000 |
|
2023-10-13 18:29:39,266 epoch 10 - iter 1805/3617 - loss 0.00561785 - time (sec): 498.61 - samples/sec: 379.46 - lr: 0.000008 - momentum: 0.000000 |
|
2023-10-13 18:31:16,248 epoch 10 - iter 2166/3617 - loss 0.00605238 - time (sec): 595.59 - samples/sec: 380.96 - lr: 0.000007 - momentum: 0.000000 |
|
2023-10-13 18:32:54,488 epoch 10 - iter 2527/3617 - loss 0.00590583 - time (sec): 693.83 - samples/sec: 382.92 - lr: 0.000005 - momentum: 0.000000 |
|
2023-10-13 18:34:33,504 epoch 10 - iter 2888/3617 - loss 0.00607679 - time (sec): 792.85 - samples/sec: 384.33 - lr: 0.000003 - momentum: 0.000000 |
|
2023-10-13 18:36:09,760 epoch 10 - iter 3249/3617 - loss 0.00604311 - time (sec): 889.10 - samples/sec: 383.09 - lr: 0.000002 - momentum: 0.000000 |
|
2023-10-13 18:37:48,846 epoch 10 - iter 3610/3617 - loss 0.00584381 - time (sec): 988.19 - samples/sec: 383.85 - lr: 0.000000 - momentum: 0.000000 |
|
2023-10-13 18:37:50,520 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:37:50,521 EPOCH 10 done: loss 0.0058 - lr: 0.000000 |
|
2023-10-13 18:38:31,617 DEV : loss 0.3819710314273834 - f1-score (micro avg) 0.6651 |
|
2023-10-13 18:38:31,677 saving best model |
|
2023-10-13 18:38:35,138 ---------------------------------------------------------------------------------------------------- |
|
2023-10-13 18:38:35,140 Loading model from best epoch ... |
|
2023-10-13 18:38:39,133 SequenceTagger predicts: Dictionary with 13 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org |
|
2023-10-13 18:39:36,439 |
|
Results: |
|
- F-score (micro) 0.6279 |
|
- F-score (macro) 0.4773 |
|
- Accuracy 0.4701 |
|
|
|
By class: |
|
precision recall f1-score support |
|
|
|
loc 0.6557 0.7411 0.6958 591 |
|
pers 0.5661 0.6835 0.6193 357 |
|
org 0.1200 0.1139 0.1169 79 |
|
|
|
micro avg 0.5886 0.6728 0.6279 1027 |
|
macro avg 0.4473 0.5128 0.4773 1027 |
|
weighted avg 0.5833 0.6728 0.6247 1027 |
|
|
|
2023-10-13 18:39:36,439 ---------------------------------------------------------------------------------------------------- |
|
|