2023-10-13 16:07:31,904 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:07:31,905 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(32001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=21, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 16:07:31,905 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:07:31,905 MultiCorpus: 5901 train + 1287 dev + 1505 test sentences - NER_HIPE_2022 Corpus: 5901 train + 1287 dev + 1505 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/hipe2020/fr/with_doc_seperator 2023-10-13 16:07:31,905 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:07:31,905 Train: 5901 sentences 2023-10-13 16:07:31,905 (train_with_dev=False, train_with_test=False) 2023-10-13 16:07:31,905 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:07:31,905 Training Params: 2023-10-13 16:07:31,905 - learning_rate: "5e-05" 2023-10-13 16:07:31,905 - mini_batch_size: "8" 2023-10-13 16:07:31,905 - max_epochs: "10" 2023-10-13 16:07:31,905 - shuffle: "True" 2023-10-13 16:07:31,906 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:07:31,906 Plugins: 2023-10-13 16:07:31,906 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 16:07:31,906 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:07:31,906 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 16:07:31,906 - metric: "('micro avg', 'f1-score')" 2023-10-13 16:07:31,906 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:07:31,906 Computation: 2023-10-13 16:07:31,906 - compute on device: cuda:0 2023-10-13 16:07:31,906 - embedding storage: none 2023-10-13 16:07:31,906 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:07:31,906 Model training base path: "hmbench-hipe2020/fr-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-2" 2023-10-13 16:07:31,906 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:07:31,906 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:07:37,143 epoch 1 - iter 73/738 - loss 2.76598445 - time (sec): 5.24 - samples/sec: 3360.36 - lr: 0.000005 - momentum: 0.000000 2023-10-13 16:07:42,169 epoch 1 - iter 146/738 - loss 1.66719703 - time (sec): 10.26 - samples/sec: 3487.80 - lr: 0.000010 - momentum: 0.000000 2023-10-13 16:07:47,178 epoch 1 - iter 219/738 - loss 1.27255728 - time (sec): 15.27 - samples/sec: 3386.07 - lr: 0.000015 - momentum: 0.000000 2023-10-13 16:07:52,113 epoch 1 - iter 292/738 - loss 1.03524511 - time (sec): 20.21 - samples/sec: 3381.44 - lr: 0.000020 - momentum: 0.000000 2023-10-13 16:07:57,063 epoch 1 - iter 365/738 - loss 0.89357834 - time (sec): 25.16 - samples/sec: 3379.11 - lr: 0.000025 - momentum: 0.000000 2023-10-13 16:08:01,878 epoch 1 - iter 438/738 - loss 0.78652245 - time (sec): 29.97 - samples/sec: 3393.22 - lr: 0.000030 - momentum: 0.000000 2023-10-13 16:08:06,712 epoch 1 - iter 511/738 - loss 0.71094911 - time (sec): 34.80 - samples/sec: 3385.50 - lr: 0.000035 - momentum: 0.000000 2023-10-13 16:08:11,369 epoch 1 - iter 584/738 - loss 0.65439322 - time (sec): 39.46 - samples/sec: 3354.63 - lr: 0.000039 - momentum: 0.000000 2023-10-13 16:08:16,348 epoch 1 - iter 657/738 - loss 0.60420483 - time (sec): 44.44 - samples/sec: 3343.22 - lr: 0.000044 - momentum: 0.000000 2023-10-13 16:08:21,114 epoch 1 - iter 730/738 - loss 0.56171741 - time (sec): 49.21 - samples/sec: 3351.84 - lr: 0.000049 - momentum: 0.000000 2023-10-13 16:08:21,588 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:08:21,588 EPOCH 1 done: loss 0.5581 - lr: 0.000049 2023-10-13 16:08:27,352 DEV : loss 0.14290109276771545 - f1-score (micro avg) 0.7015 2023-10-13 16:08:27,384 saving best model 2023-10-13 16:08:27,872 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:08:32,167 epoch 2 - iter 73/738 - loss 0.12716274 - time (sec): 4.29 - samples/sec: 3521.57 - lr: 0.000049 - momentum: 0.000000 2023-10-13 16:08:36,924 epoch 2 - iter 146/738 - loss 0.13405283 - time (sec): 9.05 - samples/sec: 3459.97 - lr: 0.000049 - momentum: 0.000000 2023-10-13 16:08:41,641 epoch 2 - iter 219/738 - loss 0.13759252 - time (sec): 13.77 - samples/sec: 3447.39 - lr: 0.000048 - momentum: 0.000000 2023-10-13 16:08:47,028 epoch 2 - iter 292/738 - loss 0.13374106 - time (sec): 19.15 - samples/sec: 3295.50 - lr: 0.000048 - momentum: 0.000000 2023-10-13 16:08:51,691 epoch 2 - iter 365/738 - loss 0.13472976 - time (sec): 23.82 - samples/sec: 3300.76 - lr: 0.000047 - momentum: 0.000000 2023-10-13 16:08:56,701 epoch 2 - iter 438/738 - loss 0.13143665 - time (sec): 28.83 - samples/sec: 3311.21 - lr: 0.000047 - momentum: 0.000000 2023-10-13 16:09:02,238 epoch 2 - iter 511/738 - loss 0.12819648 - time (sec): 34.37 - samples/sec: 3312.44 - lr: 0.000046 - momentum: 0.000000 2023-10-13 16:09:07,051 epoch 2 - iter 584/738 - loss 0.12305117 - time (sec): 39.18 - samples/sec: 3321.56 - lr: 0.000046 - momentum: 0.000000 2023-10-13 16:09:12,066 epoch 2 - iter 657/738 - loss 0.12315101 - time (sec): 44.19 - samples/sec: 3333.00 - lr: 0.000045 - momentum: 0.000000 2023-10-13 16:09:17,367 epoch 2 - iter 730/738 - loss 0.12343880 - time (sec): 49.49 - samples/sec: 3328.11 - lr: 0.000045 - momentum: 0.000000 2023-10-13 16:09:17,866 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:09:17,866 EPOCH 2 done: loss 0.1233 - lr: 0.000045 2023-10-13 16:09:28,983 DEV : loss 0.10492947697639465 - f1-score (micro avg) 0.7561 2023-10-13 16:09:29,013 saving best model 2023-10-13 16:09:29,530 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:09:34,249 epoch 3 - iter 73/738 - loss 0.05692245 - time (sec): 4.72 - samples/sec: 3271.85 - lr: 0.000044 - momentum: 0.000000 2023-10-13 16:09:39,076 epoch 3 - iter 146/738 - loss 0.06379593 - time (sec): 9.54 - samples/sec: 3360.48 - lr: 0.000043 - momentum: 0.000000 2023-10-13 16:09:43,953 epoch 3 - iter 219/738 - loss 0.07184299 - time (sec): 14.42 - samples/sec: 3352.39 - lr: 0.000043 - momentum: 0.000000 2023-10-13 16:09:48,396 epoch 3 - iter 292/738 - loss 0.07056935 - time (sec): 18.86 - samples/sec: 3355.15 - lr: 0.000042 - momentum: 0.000000 2023-10-13 16:09:53,905 epoch 3 - iter 365/738 - loss 0.06945011 - time (sec): 24.37 - samples/sec: 3322.09 - lr: 0.000042 - momentum: 0.000000 2023-10-13 16:09:59,137 epoch 3 - iter 438/738 - loss 0.06940866 - time (sec): 29.60 - samples/sec: 3357.23 - lr: 0.000041 - momentum: 0.000000 2023-10-13 16:10:04,013 epoch 3 - iter 511/738 - loss 0.06696551 - time (sec): 34.48 - samples/sec: 3346.45 - lr: 0.000041 - momentum: 0.000000 2023-10-13 16:10:08,915 epoch 3 - iter 584/738 - loss 0.06853341 - time (sec): 39.38 - samples/sec: 3359.17 - lr: 0.000040 - momentum: 0.000000 2023-10-13 16:10:14,120 epoch 3 - iter 657/738 - loss 0.06811819 - time (sec): 44.59 - samples/sec: 3350.17 - lr: 0.000040 - momentum: 0.000000 2023-10-13 16:10:18,824 epoch 3 - iter 730/738 - loss 0.06952438 - time (sec): 49.29 - samples/sec: 3343.34 - lr: 0.000039 - momentum: 0.000000 2023-10-13 16:10:19,287 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:10:19,287 EPOCH 3 done: loss 0.0696 - lr: 0.000039 2023-10-13 16:10:30,446 DEV : loss 0.12435939162969589 - f1-score (micro avg) 0.7908 2023-10-13 16:10:30,475 saving best model 2023-10-13 16:10:31,010 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:10:35,777 epoch 4 - iter 73/738 - loss 0.04217679 - time (sec): 4.76 - samples/sec: 3176.63 - lr: 0.000038 - momentum: 0.000000 2023-10-13 16:10:40,427 epoch 4 - iter 146/738 - loss 0.04322531 - time (sec): 9.41 - samples/sec: 3274.46 - lr: 0.000038 - momentum: 0.000000 2023-10-13 16:10:45,415 epoch 4 - iter 219/738 - loss 0.04486914 - time (sec): 14.40 - samples/sec: 3287.51 - lr: 0.000037 - momentum: 0.000000 2023-10-13 16:10:50,035 epoch 4 - iter 292/738 - loss 0.04509778 - time (sec): 19.02 - samples/sec: 3300.10 - lr: 0.000037 - momentum: 0.000000 2023-10-13 16:10:55,060 epoch 4 - iter 365/738 - loss 0.04617630 - time (sec): 24.05 - samples/sec: 3306.06 - lr: 0.000036 - momentum: 0.000000 2023-10-13 16:11:00,379 epoch 4 - iter 438/738 - loss 0.04564721 - time (sec): 29.37 - samples/sec: 3297.33 - lr: 0.000036 - momentum: 0.000000 2023-10-13 16:11:06,000 epoch 4 - iter 511/738 - loss 0.04566072 - time (sec): 34.99 - samples/sec: 3301.63 - lr: 0.000035 - momentum: 0.000000 2023-10-13 16:11:10,755 epoch 4 - iter 584/738 - loss 0.04718094 - time (sec): 39.74 - samples/sec: 3320.12 - lr: 0.000035 - momentum: 0.000000 2023-10-13 16:11:15,857 epoch 4 - iter 657/738 - loss 0.04928165 - time (sec): 44.84 - samples/sec: 3317.33 - lr: 0.000034 - momentum: 0.000000 2023-10-13 16:11:20,524 epoch 4 - iter 730/738 - loss 0.05030015 - time (sec): 49.51 - samples/sec: 3328.40 - lr: 0.000033 - momentum: 0.000000 2023-10-13 16:11:21,012 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:11:21,012 EPOCH 4 done: loss 0.0504 - lr: 0.000033 2023-10-13 16:11:32,198 DEV : loss 0.16232198476791382 - f1-score (micro avg) 0.791 2023-10-13 16:11:32,230 saving best model 2023-10-13 16:11:32,738 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:11:37,904 epoch 5 - iter 73/738 - loss 0.04547528 - time (sec): 5.16 - samples/sec: 3187.03 - lr: 0.000033 - momentum: 0.000000 2023-10-13 16:11:42,931 epoch 5 - iter 146/738 - loss 0.03745766 - time (sec): 10.19 - samples/sec: 3241.04 - lr: 0.000032 - momentum: 0.000000 2023-10-13 16:11:47,719 epoch 5 - iter 219/738 - loss 0.04018018 - time (sec): 14.98 - samples/sec: 3334.49 - lr: 0.000032 - momentum: 0.000000 2023-10-13 16:11:52,545 epoch 5 - iter 292/738 - loss 0.03642262 - time (sec): 19.80 - samples/sec: 3334.28 - lr: 0.000031 - momentum: 0.000000 2023-10-13 16:11:57,388 epoch 5 - iter 365/738 - loss 0.03525492 - time (sec): 24.65 - samples/sec: 3330.64 - lr: 0.000031 - momentum: 0.000000 2023-10-13 16:12:02,106 epoch 5 - iter 438/738 - loss 0.03546021 - time (sec): 29.36 - samples/sec: 3321.27 - lr: 0.000030 - momentum: 0.000000 2023-10-13 16:12:07,137 epoch 5 - iter 511/738 - loss 0.03493695 - time (sec): 34.39 - samples/sec: 3306.29 - lr: 0.000030 - momentum: 0.000000 2023-10-13 16:12:12,773 epoch 5 - iter 584/738 - loss 0.03563392 - time (sec): 40.03 - samples/sec: 3286.64 - lr: 0.000029 - momentum: 0.000000 2023-10-13 16:12:18,509 epoch 5 - iter 657/738 - loss 0.03533050 - time (sec): 45.77 - samples/sec: 3273.26 - lr: 0.000028 - momentum: 0.000000 2023-10-13 16:12:23,151 epoch 5 - iter 730/738 - loss 0.03550563 - time (sec): 50.41 - samples/sec: 3266.32 - lr: 0.000028 - momentum: 0.000000 2023-10-13 16:12:23,713 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:12:23,713 EPOCH 5 done: loss 0.0356 - lr: 0.000028 2023-10-13 16:12:34,872 DEV : loss 0.16207054257392883 - f1-score (micro avg) 0.8011 2023-10-13 16:12:34,902 saving best model 2023-10-13 16:12:35,395 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:12:39,937 epoch 6 - iter 73/738 - loss 0.03186713 - time (sec): 4.54 - samples/sec: 3270.44 - lr: 0.000027 - momentum: 0.000000 2023-10-13 16:12:45,514 epoch 6 - iter 146/738 - loss 0.02345574 - time (sec): 10.11 - samples/sec: 3355.66 - lr: 0.000027 - momentum: 0.000000 2023-10-13 16:12:50,577 epoch 6 - iter 219/738 - loss 0.02552735 - time (sec): 15.18 - samples/sec: 3347.83 - lr: 0.000026 - momentum: 0.000000 2023-10-13 16:12:55,260 epoch 6 - iter 292/738 - loss 0.02690248 - time (sec): 19.86 - samples/sec: 3336.26 - lr: 0.000026 - momentum: 0.000000 2023-10-13 16:13:01,206 epoch 6 - iter 365/738 - loss 0.02701658 - time (sec): 25.81 - samples/sec: 3272.23 - lr: 0.000025 - momentum: 0.000000 2023-10-13 16:13:06,100 epoch 6 - iter 438/738 - loss 0.02755494 - time (sec): 30.70 - samples/sec: 3300.44 - lr: 0.000025 - momentum: 0.000000 2023-10-13 16:13:10,632 epoch 6 - iter 511/738 - loss 0.02727897 - time (sec): 35.23 - samples/sec: 3311.23 - lr: 0.000024 - momentum: 0.000000 2023-10-13 16:13:15,741 epoch 6 - iter 584/738 - loss 0.02651565 - time (sec): 40.34 - samples/sec: 3301.92 - lr: 0.000023 - momentum: 0.000000 2023-10-13 16:13:20,645 epoch 6 - iter 657/738 - loss 0.02669398 - time (sec): 45.25 - samples/sec: 3290.08 - lr: 0.000023 - momentum: 0.000000 2023-10-13 16:13:25,644 epoch 6 - iter 730/738 - loss 0.02762465 - time (sec): 50.24 - samples/sec: 3280.08 - lr: 0.000022 - momentum: 0.000000 2023-10-13 16:13:26,124 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:13:26,124 EPOCH 6 done: loss 0.0276 - lr: 0.000022 2023-10-13 16:13:37,376 DEV : loss 0.17623859643936157 - f1-score (micro avg) 0.813 2023-10-13 16:13:37,409 saving best model 2023-10-13 16:13:37,915 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:13:43,425 epoch 7 - iter 73/738 - loss 0.02042456 - time (sec): 5.51 - samples/sec: 3090.32 - lr: 0.000022 - momentum: 0.000000 2023-10-13 16:13:47,723 epoch 7 - iter 146/738 - loss 0.01755039 - time (sec): 9.81 - samples/sec: 3246.41 - lr: 0.000021 - momentum: 0.000000 2023-10-13 16:13:53,127 epoch 7 - iter 219/738 - loss 0.01705598 - time (sec): 15.21 - samples/sec: 3310.24 - lr: 0.000021 - momentum: 0.000000 2023-10-13 16:13:58,575 epoch 7 - iter 292/738 - loss 0.01623354 - time (sec): 20.66 - samples/sec: 3333.59 - lr: 0.000020 - momentum: 0.000000 2023-10-13 16:14:03,036 epoch 7 - iter 365/738 - loss 0.01564171 - time (sec): 25.12 - samples/sec: 3331.51 - lr: 0.000020 - momentum: 0.000000 2023-10-13 16:14:07,628 epoch 7 - iter 438/738 - loss 0.01656402 - time (sec): 29.71 - samples/sec: 3334.05 - lr: 0.000019 - momentum: 0.000000 2023-10-13 16:14:12,209 epoch 7 - iter 511/738 - loss 0.01804093 - time (sec): 34.29 - samples/sec: 3354.90 - lr: 0.000018 - momentum: 0.000000 2023-10-13 16:14:16,920 epoch 7 - iter 584/738 - loss 0.01843500 - time (sec): 39.00 - samples/sec: 3345.53 - lr: 0.000018 - momentum: 0.000000 2023-10-13 16:14:21,848 epoch 7 - iter 657/738 - loss 0.01777761 - time (sec): 43.93 - samples/sec: 3333.64 - lr: 0.000017 - momentum: 0.000000 2023-10-13 16:14:27,303 epoch 7 - iter 730/738 - loss 0.01822353 - time (sec): 49.39 - samples/sec: 3337.85 - lr: 0.000017 - momentum: 0.000000 2023-10-13 16:14:27,768 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:14:27,768 EPOCH 7 done: loss 0.0182 - lr: 0.000017 2023-10-13 16:14:38,887 DEV : loss 0.18489772081375122 - f1-score (micro avg) 0.8296 2023-10-13 16:14:38,917 saving best model 2023-10-13 16:14:39,522 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:14:44,322 epoch 8 - iter 73/738 - loss 0.02158221 - time (sec): 4.80 - samples/sec: 3479.64 - lr: 0.000016 - momentum: 0.000000 2023-10-13 16:14:49,508 epoch 8 - iter 146/738 - loss 0.01696772 - time (sec): 9.98 - samples/sec: 3370.64 - lr: 0.000016 - momentum: 0.000000 2023-10-13 16:14:55,059 epoch 8 - iter 219/738 - loss 0.01617594 - time (sec): 15.53 - samples/sec: 3401.07 - lr: 0.000015 - momentum: 0.000000 2023-10-13 16:14:59,771 epoch 8 - iter 292/738 - loss 0.01777930 - time (sec): 20.24 - samples/sec: 3343.21 - lr: 0.000015 - momentum: 0.000000 2023-10-13 16:15:04,316 epoch 8 - iter 365/738 - loss 0.01630697 - time (sec): 24.79 - samples/sec: 3323.60 - lr: 0.000014 - momentum: 0.000000 2023-10-13 16:15:09,176 epoch 8 - iter 438/738 - loss 0.01597156 - time (sec): 29.65 - samples/sec: 3324.09 - lr: 0.000013 - momentum: 0.000000 2023-10-13 16:15:14,012 epoch 8 - iter 511/738 - loss 0.01492589 - time (sec): 34.49 - samples/sec: 3327.04 - lr: 0.000013 - momentum: 0.000000 2023-10-13 16:15:18,408 epoch 8 - iter 584/738 - loss 0.01525583 - time (sec): 38.88 - samples/sec: 3326.93 - lr: 0.000012 - momentum: 0.000000 2023-10-13 16:15:23,227 epoch 8 - iter 657/738 - loss 0.01427343 - time (sec): 43.70 - samples/sec: 3323.76 - lr: 0.000012 - momentum: 0.000000 2023-10-13 16:15:28,712 epoch 8 - iter 730/738 - loss 0.01360292 - time (sec): 49.18 - samples/sec: 3348.93 - lr: 0.000011 - momentum: 0.000000 2023-10-13 16:15:29,203 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:15:29,203 EPOCH 8 done: loss 0.0135 - lr: 0.000011 2023-10-13 16:15:40,324 DEV : loss 0.20899920165538788 - f1-score (micro avg) 0.825 2023-10-13 16:15:40,355 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:15:45,276 epoch 9 - iter 73/738 - loss 0.00737419 - time (sec): 4.92 - samples/sec: 3151.24 - lr: 0.000011 - momentum: 0.000000 2023-10-13 16:15:50,322 epoch 9 - iter 146/738 - loss 0.00639227 - time (sec): 9.97 - samples/sec: 3242.90 - lr: 0.000010 - momentum: 0.000000 2023-10-13 16:15:54,891 epoch 9 - iter 219/738 - loss 0.00609163 - time (sec): 14.54 - samples/sec: 3323.28 - lr: 0.000010 - momentum: 0.000000 2023-10-13 16:15:59,713 epoch 9 - iter 292/738 - loss 0.00710336 - time (sec): 19.36 - samples/sec: 3348.37 - lr: 0.000009 - momentum: 0.000000 2023-10-13 16:16:05,038 epoch 9 - iter 365/738 - loss 0.00870007 - time (sec): 24.68 - samples/sec: 3353.92 - lr: 0.000008 - momentum: 0.000000 2023-10-13 16:16:09,730 epoch 9 - iter 438/738 - loss 0.00815323 - time (sec): 29.37 - samples/sec: 3345.31 - lr: 0.000008 - momentum: 0.000000 2023-10-13 16:16:14,960 epoch 9 - iter 511/738 - loss 0.00770538 - time (sec): 34.60 - samples/sec: 3328.70 - lr: 0.000007 - momentum: 0.000000 2023-10-13 16:16:19,571 epoch 9 - iter 584/738 - loss 0.00789209 - time (sec): 39.21 - samples/sec: 3320.50 - lr: 0.000007 - momentum: 0.000000 2023-10-13 16:16:24,295 epoch 9 - iter 657/738 - loss 0.00748613 - time (sec): 43.94 - samples/sec: 3337.76 - lr: 0.000006 - momentum: 0.000000 2023-10-13 16:16:29,662 epoch 9 - iter 730/738 - loss 0.00819185 - time (sec): 49.31 - samples/sec: 3338.75 - lr: 0.000006 - momentum: 0.000000 2023-10-13 16:16:30,264 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:16:30,264 EPOCH 9 done: loss 0.0083 - lr: 0.000006 2023-10-13 16:16:41,419 DEV : loss 0.20789538323879242 - f1-score (micro avg) 0.8301 2023-10-13 16:16:41,457 saving best model 2023-10-13 16:16:41,968 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:16:47,776 epoch 10 - iter 73/738 - loss 0.00953757 - time (sec): 5.80 - samples/sec: 3035.36 - lr: 0.000005 - momentum: 0.000000 2023-10-13 16:16:52,458 epoch 10 - iter 146/738 - loss 0.00607430 - time (sec): 10.48 - samples/sec: 3196.87 - lr: 0.000004 - momentum: 0.000000 2023-10-13 16:16:56,727 epoch 10 - iter 219/738 - loss 0.00711998 - time (sec): 14.75 - samples/sec: 3318.90 - lr: 0.000004 - momentum: 0.000000 2023-10-13 16:17:01,617 epoch 10 - iter 292/738 - loss 0.00638173 - time (sec): 19.64 - samples/sec: 3324.20 - lr: 0.000003 - momentum: 0.000000 2023-10-13 16:17:06,558 epoch 10 - iter 365/738 - loss 0.00593203 - time (sec): 24.59 - samples/sec: 3308.48 - lr: 0.000003 - momentum: 0.000000 2023-10-13 16:17:12,087 epoch 10 - iter 438/738 - loss 0.00538934 - time (sec): 30.11 - samples/sec: 3322.99 - lr: 0.000002 - momentum: 0.000000 2023-10-13 16:17:16,645 epoch 10 - iter 511/738 - loss 0.00527968 - time (sec): 34.67 - samples/sec: 3309.72 - lr: 0.000002 - momentum: 0.000000 2023-10-13 16:17:22,018 epoch 10 - iter 584/738 - loss 0.00510774 - time (sec): 40.04 - samples/sec: 3290.48 - lr: 0.000001 - momentum: 0.000000 2023-10-13 16:17:27,000 epoch 10 - iter 657/738 - loss 0.00530120 - time (sec): 45.03 - samples/sec: 3292.97 - lr: 0.000001 - momentum: 0.000000 2023-10-13 16:17:32,035 epoch 10 - iter 730/738 - loss 0.00491710 - time (sec): 50.06 - samples/sec: 3294.31 - lr: 0.000000 - momentum: 0.000000 2023-10-13 16:17:32,457 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:17:32,458 EPOCH 10 done: loss 0.0049 - lr: 0.000000 2023-10-13 16:17:43,618 DEV : loss 0.21186788380146027 - f1-score (micro avg) 0.8321 2023-10-13 16:17:43,651 saving best model 2023-10-13 16:17:44,637 ---------------------------------------------------------------------------------------------------- 2023-10-13 16:17:44,638 Loading model from best epoch ... 2023-10-13 16:17:46,067 SequenceTagger predicts: Dictionary with 21 tags: O, S-loc, B-loc, E-loc, I-loc, S-pers, B-pers, E-pers, I-pers, S-org, B-org, E-org, I-org, S-time, B-time, E-time, I-time, S-prod, B-prod, E-prod, I-prod 2023-10-13 16:17:51,903 Results: - F-score (micro) 0.7943 - F-score (macro) 0.6995 - Accuracy 0.6824 By class: precision recall f1-score support loc 0.8442 0.8776 0.8606 858 pers 0.7558 0.7952 0.7750 537 org 0.5603 0.5985 0.5788 132 time 0.5538 0.6667 0.6050 54 prod 0.7222 0.6393 0.6783 61 micro avg 0.7769 0.8124 0.7943 1642 macro avg 0.6873 0.7155 0.6995 1642 weighted avg 0.7784 0.8124 0.7947 1642 2023-10-13 16:17:51,904 ----------------------------------------------------------------------------------------------------