2023-10-13 23:27:53,960 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:27:53,961 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(32001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 23:27:53,961 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:27:53,961 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr 2023-10-13 23:27:53,961 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:27:53,961 Train: 7936 sentences 2023-10-13 23:27:53,961 (train_with_dev=False, train_with_test=False) 2023-10-13 23:27:53,961 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:27:53,961 Training Params: 2023-10-13 23:27:53,961 - learning_rate: "5e-05" 2023-10-13 23:27:53,961 - mini_batch_size: "8" 2023-10-13 23:27:53,961 - max_epochs: "10" 2023-10-13 23:27:53,961 - shuffle: "True" 2023-10-13 23:27:53,961 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:27:53,961 Plugins: 2023-10-13 23:27:53,961 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 23:27:53,961 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:27:53,961 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 23:27:53,961 - metric: "('micro avg', 'f1-score')" 2023-10-13 23:27:53,961 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:27:53,961 Computation: 2023-10-13 23:27:53,961 - compute on device: cuda:0 2023-10-13 23:27:53,962 - embedding storage: none 2023-10-13 23:27:53,962 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:27:53,962 Model training base path: "hmbench-icdar/fr-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-3" 2023-10-13 23:27:53,962 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:27:53,962 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:28:00,314 epoch 1 - iter 99/992 - loss 1.84420524 - time (sec): 6.35 - samples/sec: 2623.00 - lr: 0.000005 - momentum: 0.000000 2023-10-13 23:28:06,201 epoch 1 - iter 198/992 - loss 1.10827375 - time (sec): 12.24 - samples/sec: 2677.78 - lr: 0.000010 - momentum: 0.000000 2023-10-13 23:28:12,435 epoch 1 - iter 297/992 - loss 0.80875373 - time (sec): 18.47 - samples/sec: 2680.34 - lr: 0.000015 - momentum: 0.000000 2023-10-13 23:28:18,266 epoch 1 - iter 396/992 - loss 0.66063021 - time (sec): 24.30 - samples/sec: 2704.15 - lr: 0.000020 - momentum: 0.000000 2023-10-13 23:28:24,336 epoch 1 - iter 495/992 - loss 0.56248912 - time (sec): 30.37 - samples/sec: 2713.70 - lr: 0.000025 - momentum: 0.000000 2023-10-13 23:28:29,967 epoch 1 - iter 594/992 - loss 0.49232335 - time (sec): 36.00 - samples/sec: 2749.86 - lr: 0.000030 - momentum: 0.000000 2023-10-13 23:28:35,712 epoch 1 - iter 693/992 - loss 0.44224564 - time (sec): 41.75 - samples/sec: 2760.11 - lr: 0.000035 - momentum: 0.000000 2023-10-13 23:28:41,632 epoch 1 - iter 792/992 - loss 0.40489892 - time (sec): 47.67 - samples/sec: 2764.55 - lr: 0.000040 - momentum: 0.000000 2023-10-13 23:28:47,351 epoch 1 - iter 891/992 - loss 0.37416440 - time (sec): 53.39 - samples/sec: 2768.49 - lr: 0.000045 - momentum: 0.000000 2023-10-13 23:28:53,006 epoch 1 - iter 990/992 - loss 0.35114103 - time (sec): 59.04 - samples/sec: 2773.46 - lr: 0.000050 - momentum: 0.000000 2023-10-13 23:28:53,112 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:28:53,112 EPOCH 1 done: loss 0.3509 - lr: 0.000050 2023-10-13 23:28:56,206 DEV : loss 0.0934109166264534 - f1-score (micro avg) 0.7066 2023-10-13 23:28:56,226 saving best model 2023-10-13 23:28:56,661 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:29:02,883 epoch 2 - iter 99/992 - loss 0.12408130 - time (sec): 6.22 - samples/sec: 2830.88 - lr: 0.000049 - momentum: 0.000000 2023-10-13 23:29:09,014 epoch 2 - iter 198/992 - loss 0.11801383 - time (sec): 12.35 - samples/sec: 2720.96 - lr: 0.000049 - momentum: 0.000000 2023-10-13 23:29:14,757 epoch 2 - iter 297/992 - loss 0.11422384 - time (sec): 18.09 - samples/sec: 2769.68 - lr: 0.000048 - momentum: 0.000000 2023-10-13 23:29:20,479 epoch 2 - iter 396/992 - loss 0.11197551 - time (sec): 23.82 - samples/sec: 2753.22 - lr: 0.000048 - momentum: 0.000000 2023-10-13 23:29:26,220 epoch 2 - iter 495/992 - loss 0.11065679 - time (sec): 29.56 - samples/sec: 2763.91 - lr: 0.000047 - momentum: 0.000000 2023-10-13 23:29:32,096 epoch 2 - iter 594/992 - loss 0.10893379 - time (sec): 35.43 - samples/sec: 2759.56 - lr: 0.000047 - momentum: 0.000000 2023-10-13 23:29:37,819 epoch 2 - iter 693/992 - loss 0.10716282 - time (sec): 41.16 - samples/sec: 2765.86 - lr: 0.000046 - momentum: 0.000000 2023-10-13 23:29:43,750 epoch 2 - iter 792/992 - loss 0.10687508 - time (sec): 47.09 - samples/sec: 2774.79 - lr: 0.000046 - momentum: 0.000000 2023-10-13 23:29:49,391 epoch 2 - iter 891/992 - loss 0.10650512 - time (sec): 52.73 - samples/sec: 2772.57 - lr: 0.000045 - momentum: 0.000000 2023-10-13 23:29:55,474 epoch 2 - iter 990/992 - loss 0.10524813 - time (sec): 58.81 - samples/sec: 2784.28 - lr: 0.000044 - momentum: 0.000000 2023-10-13 23:29:55,580 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:29:55,580 EPOCH 2 done: loss 0.1053 - lr: 0.000044 2023-10-13 23:29:58,941 DEV : loss 0.0894036665558815 - f1-score (micro avg) 0.7373 2023-10-13 23:29:58,961 saving best model 2023-10-13 23:29:59,934 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:30:05,864 epoch 3 - iter 99/992 - loss 0.06449001 - time (sec): 5.93 - samples/sec: 2757.86 - lr: 0.000044 - momentum: 0.000000 2023-10-13 23:30:11,666 epoch 3 - iter 198/992 - loss 0.06269725 - time (sec): 11.73 - samples/sec: 2753.00 - lr: 0.000043 - momentum: 0.000000 2023-10-13 23:30:17,362 epoch 3 - iter 297/992 - loss 0.06703869 - time (sec): 17.42 - samples/sec: 2775.83 - lr: 0.000043 - momentum: 0.000000 2023-10-13 23:30:23,235 epoch 3 - iter 396/992 - loss 0.06854271 - time (sec): 23.30 - samples/sec: 2803.73 - lr: 0.000042 - momentum: 0.000000 2023-10-13 23:30:29,364 epoch 3 - iter 495/992 - loss 0.07038135 - time (sec): 29.43 - samples/sec: 2792.79 - lr: 0.000042 - momentum: 0.000000 2023-10-13 23:30:35,163 epoch 3 - iter 594/992 - loss 0.07075691 - time (sec): 35.22 - samples/sec: 2788.96 - lr: 0.000041 - momentum: 0.000000 2023-10-13 23:30:41,272 epoch 3 - iter 693/992 - loss 0.07173481 - time (sec): 41.33 - samples/sec: 2773.31 - lr: 0.000041 - momentum: 0.000000 2023-10-13 23:30:47,040 epoch 3 - iter 792/992 - loss 0.07257445 - time (sec): 47.10 - samples/sec: 2771.06 - lr: 0.000040 - momentum: 0.000000 2023-10-13 23:30:52,852 epoch 3 - iter 891/992 - loss 0.07399709 - time (sec): 52.91 - samples/sec: 2775.77 - lr: 0.000039 - momentum: 0.000000 2023-10-13 23:30:58,651 epoch 3 - iter 990/992 - loss 0.07304166 - time (sec): 58.71 - samples/sec: 2787.44 - lr: 0.000039 - momentum: 0.000000 2023-10-13 23:30:58,762 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:30:58,762 EPOCH 3 done: loss 0.0730 - lr: 0.000039 2023-10-13 23:31:02,203 DEV : loss 0.10655300319194794 - f1-score (micro avg) 0.7381 2023-10-13 23:31:02,224 saving best model 2023-10-13 23:31:02,734 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:31:08,758 epoch 4 - iter 99/992 - loss 0.04574350 - time (sec): 6.02 - samples/sec: 2832.57 - lr: 0.000038 - momentum: 0.000000 2023-10-13 23:31:14,898 epoch 4 - iter 198/992 - loss 0.04764950 - time (sec): 12.16 - samples/sec: 2726.71 - lr: 0.000038 - momentum: 0.000000 2023-10-13 23:31:20,551 epoch 4 - iter 297/992 - loss 0.04943979 - time (sec): 17.81 - samples/sec: 2747.51 - lr: 0.000037 - momentum: 0.000000 2023-10-13 23:31:26,202 epoch 4 - iter 396/992 - loss 0.05038291 - time (sec): 23.46 - samples/sec: 2783.86 - lr: 0.000037 - momentum: 0.000000 2023-10-13 23:31:32,240 epoch 4 - iter 495/992 - loss 0.05076880 - time (sec): 29.50 - samples/sec: 2764.50 - lr: 0.000036 - momentum: 0.000000 2023-10-13 23:31:38,242 epoch 4 - iter 594/992 - loss 0.05156694 - time (sec): 35.50 - samples/sec: 2763.16 - lr: 0.000036 - momentum: 0.000000 2023-10-13 23:31:43,936 epoch 4 - iter 693/992 - loss 0.05246850 - time (sec): 41.19 - samples/sec: 2782.77 - lr: 0.000035 - momentum: 0.000000 2023-10-13 23:31:49,696 epoch 4 - iter 792/992 - loss 0.05316471 - time (sec): 46.95 - samples/sec: 2783.75 - lr: 0.000034 - momentum: 0.000000 2023-10-13 23:31:55,681 epoch 4 - iter 891/992 - loss 0.05363958 - time (sec): 52.94 - samples/sec: 2787.31 - lr: 0.000034 - momentum: 0.000000 2023-10-13 23:32:01,604 epoch 4 - iter 990/992 - loss 0.05300719 - time (sec): 58.86 - samples/sec: 2783.74 - lr: 0.000033 - momentum: 0.000000 2023-10-13 23:32:01,704 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:32:01,705 EPOCH 4 done: loss 0.0530 - lr: 0.000033 2023-10-13 23:32:05,121 DEV : loss 0.15282906591892242 - f1-score (micro avg) 0.7405 2023-10-13 23:32:05,142 saving best model 2023-10-13 23:32:05,677 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:32:11,818 epoch 5 - iter 99/992 - loss 0.03701588 - time (sec): 6.14 - samples/sec: 2763.56 - lr: 0.000033 - momentum: 0.000000 2023-10-13 23:32:17,742 epoch 5 - iter 198/992 - loss 0.03627021 - time (sec): 12.06 - samples/sec: 2740.35 - lr: 0.000032 - momentum: 0.000000 2023-10-13 23:32:24,414 epoch 5 - iter 297/992 - loss 0.03801493 - time (sec): 18.73 - samples/sec: 2692.51 - lr: 0.000032 - momentum: 0.000000 2023-10-13 23:32:30,313 epoch 5 - iter 396/992 - loss 0.04018940 - time (sec): 24.63 - samples/sec: 2697.06 - lr: 0.000031 - momentum: 0.000000 2023-10-13 23:32:36,272 epoch 5 - iter 495/992 - loss 0.03982197 - time (sec): 30.59 - samples/sec: 2710.43 - lr: 0.000031 - momentum: 0.000000 2023-10-13 23:32:42,145 epoch 5 - iter 594/992 - loss 0.04135012 - time (sec): 36.46 - samples/sec: 2712.40 - lr: 0.000030 - momentum: 0.000000 2023-10-13 23:32:47,697 epoch 5 - iter 693/992 - loss 0.04198723 - time (sec): 42.02 - samples/sec: 2723.82 - lr: 0.000029 - momentum: 0.000000 2023-10-13 23:32:53,463 epoch 5 - iter 792/992 - loss 0.04154554 - time (sec): 47.78 - samples/sec: 2733.14 - lr: 0.000029 - momentum: 0.000000 2023-10-13 23:32:59,465 epoch 5 - iter 891/992 - loss 0.04180375 - time (sec): 53.78 - samples/sec: 2751.31 - lr: 0.000028 - momentum: 0.000000 2023-10-13 23:33:05,084 epoch 5 - iter 990/992 - loss 0.04232641 - time (sec): 59.40 - samples/sec: 2752.10 - lr: 0.000028 - momentum: 0.000000 2023-10-13 23:33:05,307 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:33:05,307 EPOCH 5 done: loss 0.0424 - lr: 0.000028 2023-10-13 23:33:08,786 DEV : loss 0.14422008395195007 - f1-score (micro avg) 0.7382 2023-10-13 23:33:08,807 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:33:14,515 epoch 6 - iter 99/992 - loss 0.03270406 - time (sec): 5.71 - samples/sec: 2753.34 - lr: 0.000027 - momentum: 0.000000 2023-10-13 23:33:20,341 epoch 6 - iter 198/992 - loss 0.03440711 - time (sec): 11.53 - samples/sec: 2762.06 - lr: 0.000027 - momentum: 0.000000 2023-10-13 23:33:26,151 epoch 6 - iter 297/992 - loss 0.03164332 - time (sec): 17.34 - samples/sec: 2786.75 - lr: 0.000026 - momentum: 0.000000 2023-10-13 23:33:32,289 epoch 6 - iter 396/992 - loss 0.03087269 - time (sec): 23.48 - samples/sec: 2774.22 - lr: 0.000026 - momentum: 0.000000 2023-10-13 23:33:38,140 epoch 6 - iter 495/992 - loss 0.03029381 - time (sec): 29.33 - samples/sec: 2779.29 - lr: 0.000025 - momentum: 0.000000 2023-10-13 23:33:44,152 epoch 6 - iter 594/992 - loss 0.03194290 - time (sec): 35.34 - samples/sec: 2771.66 - lr: 0.000024 - momentum: 0.000000 2023-10-13 23:33:50,524 epoch 6 - iter 693/992 - loss 0.03148955 - time (sec): 41.72 - samples/sec: 2751.98 - lr: 0.000024 - momentum: 0.000000 2023-10-13 23:33:56,555 epoch 6 - iter 792/992 - loss 0.03151560 - time (sec): 47.75 - samples/sec: 2745.14 - lr: 0.000023 - momentum: 0.000000 2023-10-13 23:34:02,346 epoch 6 - iter 891/992 - loss 0.03159422 - time (sec): 53.54 - samples/sec: 2737.07 - lr: 0.000023 - momentum: 0.000000 2023-10-13 23:34:08,209 epoch 6 - iter 990/992 - loss 0.03160206 - time (sec): 59.40 - samples/sec: 2752.33 - lr: 0.000022 - momentum: 0.000000 2023-10-13 23:34:08,344 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:34:08,344 EPOCH 6 done: loss 0.0315 - lr: 0.000022 2023-10-13 23:34:11,797 DEV : loss 0.18077921867370605 - f1-score (micro avg) 0.7474 2023-10-13 23:34:11,818 saving best model 2023-10-13 23:34:12,359 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:34:18,498 epoch 7 - iter 99/992 - loss 0.02270269 - time (sec): 6.14 - samples/sec: 2795.67 - lr: 0.000022 - momentum: 0.000000 2023-10-13 23:34:24,910 epoch 7 - iter 198/992 - loss 0.02677230 - time (sec): 12.55 - samples/sec: 2687.84 - lr: 0.000021 - momentum: 0.000000 2023-10-13 23:34:30,704 epoch 7 - iter 297/992 - loss 0.02627154 - time (sec): 18.34 - samples/sec: 2745.71 - lr: 0.000021 - momentum: 0.000000 2023-10-13 23:34:36,691 epoch 7 - iter 396/992 - loss 0.02486597 - time (sec): 24.33 - samples/sec: 2749.78 - lr: 0.000020 - momentum: 0.000000 2023-10-13 23:34:42,564 epoch 7 - iter 495/992 - loss 0.02427333 - time (sec): 30.20 - samples/sec: 2755.90 - lr: 0.000019 - momentum: 0.000000 2023-10-13 23:34:48,585 epoch 7 - iter 594/992 - loss 0.02499253 - time (sec): 36.22 - samples/sec: 2771.56 - lr: 0.000019 - momentum: 0.000000 2023-10-13 23:34:54,382 epoch 7 - iter 693/992 - loss 0.02475078 - time (sec): 42.02 - samples/sec: 2755.90 - lr: 0.000018 - momentum: 0.000000 2023-10-13 23:35:00,122 epoch 7 - iter 792/992 - loss 0.02346216 - time (sec): 47.76 - samples/sec: 2761.23 - lr: 0.000018 - momentum: 0.000000 2023-10-13 23:35:05,982 epoch 7 - iter 891/992 - loss 0.02381140 - time (sec): 53.62 - samples/sec: 2764.08 - lr: 0.000017 - momentum: 0.000000 2023-10-13 23:35:11,552 epoch 7 - iter 990/992 - loss 0.02363118 - time (sec): 59.19 - samples/sec: 2764.57 - lr: 0.000017 - momentum: 0.000000 2023-10-13 23:35:11,659 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:35:11,659 EPOCH 7 done: loss 0.0236 - lr: 0.000017 2023-10-13 23:35:15,132 DEV : loss 0.19388847053050995 - f1-score (micro avg) 0.7533 2023-10-13 23:35:15,154 saving best model 2023-10-13 23:35:15,701 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:35:21,741 epoch 8 - iter 99/992 - loss 0.02606675 - time (sec): 6.03 - samples/sec: 2751.12 - lr: 0.000016 - momentum: 0.000000 2023-10-13 23:35:27,536 epoch 8 - iter 198/992 - loss 0.02087024 - time (sec): 11.82 - samples/sec: 2773.87 - lr: 0.000016 - momentum: 0.000000 2023-10-13 23:35:33,595 epoch 8 - iter 297/992 - loss 0.02083409 - time (sec): 17.88 - samples/sec: 2756.05 - lr: 0.000015 - momentum: 0.000000 2023-10-13 23:35:39,370 epoch 8 - iter 396/992 - loss 0.01959564 - time (sec): 23.66 - samples/sec: 2759.52 - lr: 0.000014 - momentum: 0.000000 2023-10-13 23:35:45,228 epoch 8 - iter 495/992 - loss 0.01772775 - time (sec): 29.51 - samples/sec: 2753.18 - lr: 0.000014 - momentum: 0.000000 2023-10-13 23:35:51,290 epoch 8 - iter 594/992 - loss 0.01876618 - time (sec): 35.58 - samples/sec: 2755.48 - lr: 0.000013 - momentum: 0.000000 2023-10-13 23:35:57,441 epoch 8 - iter 693/992 - loss 0.01844572 - time (sec): 41.73 - samples/sec: 2754.59 - lr: 0.000013 - momentum: 0.000000 2023-10-13 23:36:03,063 epoch 8 - iter 792/992 - loss 0.01837206 - time (sec): 47.35 - samples/sec: 2766.17 - lr: 0.000012 - momentum: 0.000000 2023-10-13 23:36:08,882 epoch 8 - iter 891/992 - loss 0.01808332 - time (sec): 53.17 - samples/sec: 2764.84 - lr: 0.000012 - momentum: 0.000000 2023-10-13 23:36:14,706 epoch 8 - iter 990/992 - loss 0.01781156 - time (sec): 58.99 - samples/sec: 2774.06 - lr: 0.000011 - momentum: 0.000000 2023-10-13 23:36:14,815 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:36:14,815 EPOCH 8 done: loss 0.0179 - lr: 0.000011 2023-10-13 23:36:18,231 DEV : loss 0.2065057009458542 - f1-score (micro avg) 0.7489 2023-10-13 23:36:18,252 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:36:23,837 epoch 9 - iter 99/992 - loss 0.00810002 - time (sec): 5.58 - samples/sec: 2755.74 - lr: 0.000011 - momentum: 0.000000 2023-10-13 23:36:29,544 epoch 9 - iter 198/992 - loss 0.01228895 - time (sec): 11.29 - samples/sec: 2762.22 - lr: 0.000010 - momentum: 0.000000 2023-10-13 23:36:35,476 epoch 9 - iter 297/992 - loss 0.01142656 - time (sec): 17.22 - samples/sec: 2805.13 - lr: 0.000009 - momentum: 0.000000 2023-10-13 23:36:41,470 epoch 9 - iter 396/992 - loss 0.01157152 - time (sec): 23.22 - samples/sec: 2800.36 - lr: 0.000009 - momentum: 0.000000 2023-10-13 23:36:48,284 epoch 9 - iter 495/992 - loss 0.01250551 - time (sec): 30.03 - samples/sec: 2754.58 - lr: 0.000008 - momentum: 0.000000 2023-10-13 23:36:54,144 epoch 9 - iter 594/992 - loss 0.01181795 - time (sec): 35.89 - samples/sec: 2761.00 - lr: 0.000008 - momentum: 0.000000 2023-10-13 23:36:59,911 epoch 9 - iter 693/992 - loss 0.01134550 - time (sec): 41.66 - samples/sec: 2767.20 - lr: 0.000007 - momentum: 0.000000 2023-10-13 23:37:05,693 epoch 9 - iter 792/992 - loss 0.01157164 - time (sec): 47.44 - samples/sec: 2773.18 - lr: 0.000007 - momentum: 0.000000 2023-10-13 23:37:11,344 epoch 9 - iter 891/992 - loss 0.01261003 - time (sec): 53.09 - samples/sec: 2773.26 - lr: 0.000006 - momentum: 0.000000 2023-10-13 23:37:17,361 epoch 9 - iter 990/992 - loss 0.01216585 - time (sec): 59.11 - samples/sec: 2769.37 - lr: 0.000006 - momentum: 0.000000 2023-10-13 23:37:17,483 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:37:17,483 EPOCH 9 done: loss 0.0122 - lr: 0.000006 2023-10-13 23:37:20,901 DEV : loss 0.21828265488147736 - f1-score (micro avg) 0.7511 2023-10-13 23:37:20,922 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:37:26,797 epoch 10 - iter 99/992 - loss 0.00487076 - time (sec): 5.87 - samples/sec: 2621.15 - lr: 0.000005 - momentum: 0.000000 2023-10-13 23:37:32,458 epoch 10 - iter 198/992 - loss 0.00499497 - time (sec): 11.53 - samples/sec: 2735.44 - lr: 0.000004 - momentum: 0.000000 2023-10-13 23:37:38,187 epoch 10 - iter 297/992 - loss 0.00669737 - time (sec): 17.26 - samples/sec: 2734.38 - lr: 0.000004 - momentum: 0.000000 2023-10-13 23:37:44,180 epoch 10 - iter 396/992 - loss 0.00712915 - time (sec): 23.26 - samples/sec: 2726.30 - lr: 0.000003 - momentum: 0.000000 2023-10-13 23:37:49,921 epoch 10 - iter 495/992 - loss 0.00806367 - time (sec): 29.00 - samples/sec: 2749.29 - lr: 0.000003 - momentum: 0.000000 2023-10-13 23:37:55,956 epoch 10 - iter 594/992 - loss 0.00780873 - time (sec): 35.03 - samples/sec: 2762.96 - lr: 0.000002 - momentum: 0.000000 2023-10-13 23:38:01,974 epoch 10 - iter 693/992 - loss 0.00776724 - time (sec): 41.05 - samples/sec: 2789.12 - lr: 0.000002 - momentum: 0.000000 2023-10-13 23:38:07,834 epoch 10 - iter 792/992 - loss 0.00771472 - time (sec): 46.91 - samples/sec: 2797.30 - lr: 0.000001 - momentum: 0.000000 2023-10-13 23:38:13,576 epoch 10 - iter 891/992 - loss 0.00813066 - time (sec): 52.65 - samples/sec: 2798.27 - lr: 0.000001 - momentum: 0.000000 2023-10-13 23:38:19,408 epoch 10 - iter 990/992 - loss 0.00863417 - time (sec): 58.48 - samples/sec: 2798.97 - lr: 0.000000 - momentum: 0.000000 2023-10-13 23:38:19,516 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:38:19,516 EPOCH 10 done: loss 0.0086 - lr: 0.000000 2023-10-13 23:38:22,986 DEV : loss 0.2138698697090149 - f1-score (micro avg) 0.7601 2023-10-13 23:38:23,006 saving best model 2023-10-13 23:38:23,968 ---------------------------------------------------------------------------------------------------- 2023-10-13 23:38:23,969 Loading model from best epoch ... 2023-10-13 23:38:25,336 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-13 23:38:28,578 Results: - F-score (micro) 0.7934 - F-score (macro) 0.7137 - Accuracy 0.6822 By class: precision recall f1-score support LOC 0.8217 0.8794 0.8496 655 PER 0.7188 0.8251 0.7683 223 ORG 0.5636 0.4882 0.5232 127 micro avg 0.7704 0.8179 0.7934 1005 macro avg 0.7014 0.7309 0.7137 1005 weighted avg 0.7662 0.8179 0.7903 1005 2023-10-13 23:38:28,579 ----------------------------------------------------------------------------------------------------