2023-10-13 22:32:35,467 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:32:35,468 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(32001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=13, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-13 22:32:35,468 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:32:35,468 MultiCorpus: 7936 train + 992 dev + 992 test sentences - NER_ICDAR_EUROPEANA Corpus: 7936 train + 992 dev + 992 test sentences - /root/.flair/datasets/ner_icdar_europeana/fr 2023-10-13 22:32:35,469 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:32:35,469 Train: 7936 sentences 2023-10-13 22:32:35,469 (train_with_dev=False, train_with_test=False) 2023-10-13 22:32:35,469 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:32:35,469 Training Params: 2023-10-13 22:32:35,469 - learning_rate: "5e-05" 2023-10-13 22:32:35,469 - mini_batch_size: "8" 2023-10-13 22:32:35,469 - max_epochs: "10" 2023-10-13 22:32:35,469 - shuffle: "True" 2023-10-13 22:32:35,469 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:32:35,469 Plugins: 2023-10-13 22:32:35,469 - LinearScheduler | warmup_fraction: '0.1' 2023-10-13 22:32:35,469 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:32:35,469 Final evaluation on model from best epoch (best-model.pt) 2023-10-13 22:32:35,469 - metric: "('micro avg', 'f1-score')" 2023-10-13 22:32:35,469 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:32:35,469 Computation: 2023-10-13 22:32:35,469 - compute on device: cuda:0 2023-10-13 22:32:35,469 - embedding storage: none 2023-10-13 22:32:35,469 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:32:35,469 Model training base path: "hmbench-icdar/fr-dbmdz/bert-base-historic-multilingual-cased-bs8-wsFalse-e10-lr5e-05-poolingfirst-layers-1-crfFalse-2" 2023-10-13 22:32:35,469 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:32:35,469 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:32:42,105 epoch 1 - iter 99/992 - loss 1.97625335 - time (sec): 6.63 - samples/sec: 2619.78 - lr: 0.000005 - momentum: 0.000000 2023-10-13 22:32:48,119 epoch 1 - iter 198/992 - loss 1.20445619 - time (sec): 12.65 - samples/sec: 2622.63 - lr: 0.000010 - momentum: 0.000000 2023-10-13 22:32:53,990 epoch 1 - iter 297/992 - loss 0.90127542 - time (sec): 18.52 - samples/sec: 2645.29 - lr: 0.000015 - momentum: 0.000000 2023-10-13 22:32:59,842 epoch 1 - iter 396/992 - loss 0.73031528 - time (sec): 24.37 - samples/sec: 2664.09 - lr: 0.000020 - momentum: 0.000000 2023-10-13 22:33:05,553 epoch 1 - iter 495/992 - loss 0.61732090 - time (sec): 30.08 - samples/sec: 2695.40 - lr: 0.000025 - momentum: 0.000000 2023-10-13 22:33:11,400 epoch 1 - iter 594/992 - loss 0.53727279 - time (sec): 35.93 - samples/sec: 2719.61 - lr: 0.000030 - momentum: 0.000000 2023-10-13 22:33:17,199 epoch 1 - iter 693/992 - loss 0.48213515 - time (sec): 41.73 - samples/sec: 2724.53 - lr: 0.000035 - momentum: 0.000000 2023-10-13 22:33:23,140 epoch 1 - iter 792/992 - loss 0.43768861 - time (sec): 47.67 - samples/sec: 2731.72 - lr: 0.000040 - momentum: 0.000000 2023-10-13 22:33:29,370 epoch 1 - iter 891/992 - loss 0.40269124 - time (sec): 53.90 - samples/sec: 2727.47 - lr: 0.000045 - momentum: 0.000000 2023-10-13 22:33:35,471 epoch 1 - iter 990/992 - loss 0.37457303 - time (sec): 60.00 - samples/sec: 2724.88 - lr: 0.000050 - momentum: 0.000000 2023-10-13 22:33:35,623 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:33:35,624 EPOCH 1 done: loss 0.3735 - lr: 0.000050 2023-10-13 22:33:38,899 DEV : loss 0.10837739706039429 - f1-score (micro avg) 0.6822 2023-10-13 22:33:38,924 saving best model 2023-10-13 22:33:39,350 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:33:45,003 epoch 2 - iter 99/992 - loss 0.10295012 - time (sec): 5.65 - samples/sec: 2750.21 - lr: 0.000049 - momentum: 0.000000 2023-10-13 22:33:50,901 epoch 2 - iter 198/992 - loss 0.10973798 - time (sec): 11.55 - samples/sec: 2718.58 - lr: 0.000049 - momentum: 0.000000 2023-10-13 22:33:56,904 epoch 2 - iter 297/992 - loss 0.10543489 - time (sec): 17.55 - samples/sec: 2761.93 - lr: 0.000048 - momentum: 0.000000 2023-10-13 22:34:02,862 epoch 2 - iter 396/992 - loss 0.10381763 - time (sec): 23.51 - samples/sec: 2769.50 - lr: 0.000048 - momentum: 0.000000 2023-10-13 22:34:08,919 epoch 2 - iter 495/992 - loss 0.10311459 - time (sec): 29.57 - samples/sec: 2751.80 - lr: 0.000047 - momentum: 0.000000 2023-10-13 22:34:14,729 epoch 2 - iter 594/992 - loss 0.10171878 - time (sec): 35.38 - samples/sec: 2751.64 - lr: 0.000047 - momentum: 0.000000 2023-10-13 22:34:20,492 epoch 2 - iter 693/992 - loss 0.09979812 - time (sec): 41.14 - samples/sec: 2754.11 - lr: 0.000046 - momentum: 0.000000 2023-10-13 22:34:26,580 epoch 2 - iter 792/992 - loss 0.10186744 - time (sec): 47.23 - samples/sec: 2753.33 - lr: 0.000046 - momentum: 0.000000 2023-10-13 22:34:32,670 epoch 2 - iter 891/992 - loss 0.10186196 - time (sec): 53.32 - samples/sec: 2741.16 - lr: 0.000045 - momentum: 0.000000 2023-10-13 22:34:38,791 epoch 2 - iter 990/992 - loss 0.10164595 - time (sec): 59.44 - samples/sec: 2752.43 - lr: 0.000044 - momentum: 0.000000 2023-10-13 22:34:38,920 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:34:38,920 EPOCH 2 done: loss 0.1015 - lr: 0.000044 2023-10-13 22:34:43,263 DEV : loss 0.08593912422657013 - f1-score (micro avg) 0.7422 2023-10-13 22:34:43,287 saving best model 2023-10-13 22:34:43,770 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:34:49,664 epoch 3 - iter 99/992 - loss 0.08176039 - time (sec): 5.89 - samples/sec: 2717.35 - lr: 0.000044 - momentum: 0.000000 2023-10-13 22:34:55,620 epoch 3 - iter 198/992 - loss 0.07620139 - time (sec): 11.85 - samples/sec: 2720.74 - lr: 0.000043 - momentum: 0.000000 2023-10-13 22:35:01,359 epoch 3 - iter 297/992 - loss 0.07242783 - time (sec): 17.59 - samples/sec: 2743.33 - lr: 0.000043 - momentum: 0.000000 2023-10-13 22:35:07,179 epoch 3 - iter 396/992 - loss 0.07362963 - time (sec): 23.41 - samples/sec: 2759.74 - lr: 0.000042 - momentum: 0.000000 2023-10-13 22:35:13,060 epoch 3 - iter 495/992 - loss 0.07299823 - time (sec): 29.29 - samples/sec: 2761.91 - lr: 0.000042 - momentum: 0.000000 2023-10-13 22:35:19,048 epoch 3 - iter 594/992 - loss 0.07330654 - time (sec): 35.27 - samples/sec: 2761.44 - lr: 0.000041 - momentum: 0.000000 2023-10-13 22:35:25,090 epoch 3 - iter 693/992 - loss 0.07489880 - time (sec): 41.32 - samples/sec: 2747.94 - lr: 0.000041 - momentum: 0.000000 2023-10-13 22:35:31,036 epoch 3 - iter 792/992 - loss 0.07562313 - time (sec): 47.26 - samples/sec: 2760.10 - lr: 0.000040 - momentum: 0.000000 2023-10-13 22:35:37,046 epoch 3 - iter 891/992 - loss 0.07372307 - time (sec): 53.27 - samples/sec: 2766.49 - lr: 0.000039 - momentum: 0.000000 2023-10-13 22:35:42,895 epoch 3 - iter 990/992 - loss 0.07373060 - time (sec): 59.12 - samples/sec: 2769.61 - lr: 0.000039 - momentum: 0.000000 2023-10-13 22:35:42,996 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:35:42,996 EPOCH 3 done: loss 0.0737 - lr: 0.000039 2023-10-13 22:35:46,388 DEV : loss 0.10231217741966248 - f1-score (micro avg) 0.7589 2023-10-13 22:35:46,412 saving best model 2023-10-13 22:35:46,925 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:35:52,809 epoch 4 - iter 99/992 - loss 0.05134981 - time (sec): 5.88 - samples/sec: 2723.80 - lr: 0.000038 - momentum: 0.000000 2023-10-13 22:35:59,141 epoch 4 - iter 198/992 - loss 0.05017639 - time (sec): 12.21 - samples/sec: 2716.64 - lr: 0.000038 - momentum: 0.000000 2023-10-13 22:36:05,090 epoch 4 - iter 297/992 - loss 0.04727321 - time (sec): 18.16 - samples/sec: 2718.31 - lr: 0.000037 - momentum: 0.000000 2023-10-13 22:36:10,869 epoch 4 - iter 396/992 - loss 0.04808307 - time (sec): 23.94 - samples/sec: 2749.13 - lr: 0.000037 - momentum: 0.000000 2023-10-13 22:36:16,706 epoch 4 - iter 495/992 - loss 0.04713806 - time (sec): 29.78 - samples/sec: 2768.47 - lr: 0.000036 - momentum: 0.000000 2023-10-13 22:36:22,555 epoch 4 - iter 594/992 - loss 0.04822839 - time (sec): 35.63 - samples/sec: 2769.58 - lr: 0.000036 - momentum: 0.000000 2023-10-13 22:36:28,395 epoch 4 - iter 693/992 - loss 0.04982517 - time (sec): 41.47 - samples/sec: 2764.17 - lr: 0.000035 - momentum: 0.000000 2023-10-13 22:36:34,340 epoch 4 - iter 792/992 - loss 0.05129636 - time (sec): 47.41 - samples/sec: 2765.07 - lr: 0.000034 - momentum: 0.000000 2023-10-13 22:36:39,871 epoch 4 - iter 891/992 - loss 0.05132015 - time (sec): 52.94 - samples/sec: 2782.54 - lr: 0.000034 - momentum: 0.000000 2023-10-13 22:36:46,060 epoch 4 - iter 990/992 - loss 0.05099074 - time (sec): 59.13 - samples/sec: 2765.99 - lr: 0.000033 - momentum: 0.000000 2023-10-13 22:36:46,190 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:36:46,190 EPOCH 4 done: loss 0.0510 - lr: 0.000033 2023-10-13 22:36:49,798 DEV : loss 0.13146057724952698 - f1-score (micro avg) 0.7587 2023-10-13 22:36:49,822 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:36:55,595 epoch 5 - iter 99/992 - loss 0.03914982 - time (sec): 5.77 - samples/sec: 2761.13 - lr: 0.000033 - momentum: 0.000000 2023-10-13 22:37:01,451 epoch 5 - iter 198/992 - loss 0.03993438 - time (sec): 11.63 - samples/sec: 2786.06 - lr: 0.000032 - momentum: 0.000000 2023-10-13 22:37:07,215 epoch 5 - iter 297/992 - loss 0.03848388 - time (sec): 17.39 - samples/sec: 2811.92 - lr: 0.000032 - momentum: 0.000000 2023-10-13 22:37:13,113 epoch 5 - iter 396/992 - loss 0.03928207 - time (sec): 23.29 - samples/sec: 2821.97 - lr: 0.000031 - momentum: 0.000000 2023-10-13 22:37:19,287 epoch 5 - iter 495/992 - loss 0.04014864 - time (sec): 29.46 - samples/sec: 2803.85 - lr: 0.000031 - momentum: 0.000000 2023-10-13 22:37:25,177 epoch 5 - iter 594/992 - loss 0.04251426 - time (sec): 35.35 - samples/sec: 2787.71 - lr: 0.000030 - momentum: 0.000000 2023-10-13 22:37:30,988 epoch 5 - iter 693/992 - loss 0.04162004 - time (sec): 41.17 - samples/sec: 2781.73 - lr: 0.000029 - momentum: 0.000000 2023-10-13 22:37:36,743 epoch 5 - iter 792/992 - loss 0.04073969 - time (sec): 46.92 - samples/sec: 2780.60 - lr: 0.000029 - momentum: 0.000000 2023-10-13 22:37:43,244 epoch 5 - iter 891/992 - loss 0.04252912 - time (sec): 53.42 - samples/sec: 2758.70 - lr: 0.000028 - momentum: 0.000000 2023-10-13 22:37:48,877 epoch 5 - iter 990/992 - loss 0.04211590 - time (sec): 59.05 - samples/sec: 2768.82 - lr: 0.000028 - momentum: 0.000000 2023-10-13 22:37:49,038 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:37:49,038 EPOCH 5 done: loss 0.0421 - lr: 0.000028 2023-10-13 22:37:52,595 DEV : loss 0.15390822291374207 - f1-score (micro avg) 0.7563 2023-10-13 22:37:52,618 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:37:58,397 epoch 6 - iter 99/992 - loss 0.02447859 - time (sec): 5.78 - samples/sec: 2857.09 - lr: 0.000027 - momentum: 0.000000 2023-10-13 22:38:04,483 epoch 6 - iter 198/992 - loss 0.02281979 - time (sec): 11.86 - samples/sec: 2814.99 - lr: 0.000027 - momentum: 0.000000 2023-10-13 22:38:10,375 epoch 6 - iter 297/992 - loss 0.02449871 - time (sec): 17.76 - samples/sec: 2836.72 - lr: 0.000026 - momentum: 0.000000 2023-10-13 22:38:16,330 epoch 6 - iter 396/992 - loss 0.02562274 - time (sec): 23.71 - samples/sec: 2816.53 - lr: 0.000026 - momentum: 0.000000 2023-10-13 22:38:22,242 epoch 6 - iter 495/992 - loss 0.02613520 - time (sec): 29.62 - samples/sec: 2780.83 - lr: 0.000025 - momentum: 0.000000 2023-10-13 22:38:28,192 epoch 6 - iter 594/992 - loss 0.02619509 - time (sec): 35.57 - samples/sec: 2768.90 - lr: 0.000024 - momentum: 0.000000 2023-10-13 22:38:33,861 epoch 6 - iter 693/992 - loss 0.02665766 - time (sec): 41.24 - samples/sec: 2754.02 - lr: 0.000024 - momentum: 0.000000 2023-10-13 22:38:39,952 epoch 6 - iter 792/992 - loss 0.02787193 - time (sec): 47.33 - samples/sec: 2770.66 - lr: 0.000023 - momentum: 0.000000 2023-10-13 22:38:45,783 epoch 6 - iter 891/992 - loss 0.02982103 - time (sec): 53.16 - samples/sec: 2766.54 - lr: 0.000023 - momentum: 0.000000 2023-10-13 22:38:51,599 epoch 6 - iter 990/992 - loss 0.02967950 - time (sec): 58.98 - samples/sec: 2772.64 - lr: 0.000022 - momentum: 0.000000 2023-10-13 22:38:51,724 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:38:51,724 EPOCH 6 done: loss 0.0296 - lr: 0.000022 2023-10-13 22:38:55,149 DEV : loss 0.19599080085754395 - f1-score (micro avg) 0.7572 2023-10-13 22:38:55,170 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:39:00,912 epoch 7 - iter 99/992 - loss 0.02376152 - time (sec): 5.74 - samples/sec: 2826.61 - lr: 0.000022 - momentum: 0.000000 2023-10-13 22:39:06,655 epoch 7 - iter 198/992 - loss 0.01958953 - time (sec): 11.48 - samples/sec: 2810.22 - lr: 0.000021 - momentum: 0.000000 2023-10-13 22:39:12,687 epoch 7 - iter 297/992 - loss 0.01962767 - time (sec): 17.52 - samples/sec: 2831.96 - lr: 0.000021 - momentum: 0.000000 2023-10-13 22:39:18,688 epoch 7 - iter 396/992 - loss 0.01948507 - time (sec): 23.52 - samples/sec: 2822.98 - lr: 0.000020 - momentum: 0.000000 2023-10-13 22:39:24,654 epoch 7 - iter 495/992 - loss 0.01985791 - time (sec): 29.48 - samples/sec: 2810.77 - lr: 0.000019 - momentum: 0.000000 2023-10-13 22:39:30,455 epoch 7 - iter 594/992 - loss 0.02068738 - time (sec): 35.28 - samples/sec: 2793.89 - lr: 0.000019 - momentum: 0.000000 2023-10-13 22:39:36,232 epoch 7 - iter 693/992 - loss 0.02241871 - time (sec): 41.06 - samples/sec: 2796.58 - lr: 0.000018 - momentum: 0.000000 2023-10-13 22:39:41,978 epoch 7 - iter 792/992 - loss 0.02181208 - time (sec): 46.81 - samples/sec: 2804.38 - lr: 0.000018 - momentum: 0.000000 2023-10-13 22:39:47,996 epoch 7 - iter 891/992 - loss 0.02217841 - time (sec): 52.83 - samples/sec: 2783.90 - lr: 0.000017 - momentum: 0.000000 2023-10-13 22:39:54,016 epoch 7 - iter 990/992 - loss 0.02172978 - time (sec): 58.84 - samples/sec: 2783.19 - lr: 0.000017 - momentum: 0.000000 2023-10-13 22:39:54,115 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:39:54,115 EPOCH 7 done: loss 0.0217 - lr: 0.000017 2023-10-13 22:39:57,955 DEV : loss 0.18885478377342224 - f1-score (micro avg) 0.765 2023-10-13 22:39:57,976 saving best model 2023-10-13 22:39:58,428 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:40:04,318 epoch 8 - iter 99/992 - loss 0.02286923 - time (sec): 5.88 - samples/sec: 2805.77 - lr: 0.000016 - momentum: 0.000000 2023-10-13 22:40:10,386 epoch 8 - iter 198/992 - loss 0.01729121 - time (sec): 11.95 - samples/sec: 2800.86 - lr: 0.000016 - momentum: 0.000000 2023-10-13 22:40:16,196 epoch 8 - iter 297/992 - loss 0.01585346 - time (sec): 17.76 - samples/sec: 2818.05 - lr: 0.000015 - momentum: 0.000000 2023-10-13 22:40:22,131 epoch 8 - iter 396/992 - loss 0.01479878 - time (sec): 23.70 - samples/sec: 2838.92 - lr: 0.000014 - momentum: 0.000000 2023-10-13 22:40:27,927 epoch 8 - iter 495/992 - loss 0.01559774 - time (sec): 29.49 - samples/sec: 2832.36 - lr: 0.000014 - momentum: 0.000000 2023-10-13 22:40:33,638 epoch 8 - iter 594/992 - loss 0.01479709 - time (sec): 35.20 - samples/sec: 2821.11 - lr: 0.000013 - momentum: 0.000000 2023-10-13 22:40:39,365 epoch 8 - iter 693/992 - loss 0.01447454 - time (sec): 40.93 - samples/sec: 2808.13 - lr: 0.000013 - momentum: 0.000000 2023-10-13 22:40:45,325 epoch 8 - iter 792/992 - loss 0.01516714 - time (sec): 46.89 - samples/sec: 2803.50 - lr: 0.000012 - momentum: 0.000000 2023-10-13 22:40:51,152 epoch 8 - iter 891/992 - loss 0.01551720 - time (sec): 52.72 - samples/sec: 2801.05 - lr: 0.000012 - momentum: 0.000000 2023-10-13 22:40:56,972 epoch 8 - iter 990/992 - loss 0.01557435 - time (sec): 58.54 - samples/sec: 2798.83 - lr: 0.000011 - momentum: 0.000000 2023-10-13 22:40:57,071 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:40:57,072 EPOCH 8 done: loss 0.0159 - lr: 0.000011 2023-10-13 22:41:00,599 DEV : loss 0.19485069811344147 - f1-score (micro avg) 0.7664 2023-10-13 22:41:00,622 saving best model 2023-10-13 22:41:01,122 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:41:06,817 epoch 9 - iter 99/992 - loss 0.00725184 - time (sec): 5.69 - samples/sec: 2598.12 - lr: 0.000011 - momentum: 0.000000 2023-10-13 22:41:12,615 epoch 9 - iter 198/992 - loss 0.01324789 - time (sec): 11.49 - samples/sec: 2671.94 - lr: 0.000010 - momentum: 0.000000 2023-10-13 22:41:18,618 epoch 9 - iter 297/992 - loss 0.01262387 - time (sec): 17.49 - samples/sec: 2676.85 - lr: 0.000009 - momentum: 0.000000 2023-10-13 22:41:24,527 epoch 9 - iter 396/992 - loss 0.01078576 - time (sec): 23.40 - samples/sec: 2700.15 - lr: 0.000009 - momentum: 0.000000 2023-10-13 22:41:30,687 epoch 9 - iter 495/992 - loss 0.01086785 - time (sec): 29.56 - samples/sec: 2708.13 - lr: 0.000008 - momentum: 0.000000 2023-10-13 22:41:36,631 epoch 9 - iter 594/992 - loss 0.01119434 - time (sec): 35.50 - samples/sec: 2722.13 - lr: 0.000008 - momentum: 0.000000 2023-10-13 22:41:42,383 epoch 9 - iter 693/992 - loss 0.01095970 - time (sec): 41.26 - samples/sec: 2736.96 - lr: 0.000007 - momentum: 0.000000 2023-10-13 22:41:48,299 epoch 9 - iter 792/992 - loss 0.01070482 - time (sec): 47.17 - samples/sec: 2753.00 - lr: 0.000007 - momentum: 0.000000 2023-10-13 22:41:54,240 epoch 9 - iter 891/992 - loss 0.01101838 - time (sec): 53.11 - samples/sec: 2759.58 - lr: 0.000006 - momentum: 0.000000 2023-10-13 22:42:00,495 epoch 9 - iter 990/992 - loss 0.01099219 - time (sec): 59.37 - samples/sec: 2755.51 - lr: 0.000006 - momentum: 0.000000 2023-10-13 22:42:00,600 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:42:00,600 EPOCH 9 done: loss 0.0110 - lr: 0.000006 2023-10-13 22:42:04,597 DEV : loss 0.20399770140647888 - f1-score (micro avg) 0.767 2023-10-13 22:42:04,619 saving best model 2023-10-13 22:42:05,118 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:42:11,036 epoch 10 - iter 99/992 - loss 0.00694056 - time (sec): 5.92 - samples/sec: 2777.84 - lr: 0.000005 - momentum: 0.000000 2023-10-13 22:42:17,345 epoch 10 - iter 198/992 - loss 0.00836868 - time (sec): 12.23 - samples/sec: 2802.73 - lr: 0.000004 - momentum: 0.000000 2023-10-13 22:42:22,986 epoch 10 - iter 297/992 - loss 0.00886989 - time (sec): 17.87 - samples/sec: 2785.15 - lr: 0.000004 - momentum: 0.000000 2023-10-13 22:42:28,741 epoch 10 - iter 396/992 - loss 0.00862895 - time (sec): 23.62 - samples/sec: 2804.19 - lr: 0.000003 - momentum: 0.000000 2023-10-13 22:42:35,030 epoch 10 - iter 495/992 - loss 0.00902308 - time (sec): 29.91 - samples/sec: 2785.89 - lr: 0.000003 - momentum: 0.000000 2023-10-13 22:42:40,907 epoch 10 - iter 594/992 - loss 0.00870118 - time (sec): 35.79 - samples/sec: 2776.25 - lr: 0.000002 - momentum: 0.000000 2023-10-13 22:42:46,703 epoch 10 - iter 693/992 - loss 0.00845402 - time (sec): 41.58 - samples/sec: 2756.78 - lr: 0.000002 - momentum: 0.000000 2023-10-13 22:42:52,747 epoch 10 - iter 792/992 - loss 0.00817124 - time (sec): 47.63 - samples/sec: 2751.49 - lr: 0.000001 - momentum: 0.000000 2023-10-13 22:42:58,806 epoch 10 - iter 891/992 - loss 0.00784789 - time (sec): 53.69 - samples/sec: 2751.57 - lr: 0.000001 - momentum: 0.000000 2023-10-13 22:43:04,469 epoch 10 - iter 990/992 - loss 0.00753226 - time (sec): 59.35 - samples/sec: 2757.97 - lr: 0.000000 - momentum: 0.000000 2023-10-13 22:43:04,580 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:43:04,580 EPOCH 10 done: loss 0.0075 - lr: 0.000000 2023-10-13 22:43:07,991 DEV : loss 0.20772945880889893 - f1-score (micro avg) 0.7698 2023-10-13 22:43:08,013 saving best model 2023-10-13 22:43:08,849 ---------------------------------------------------------------------------------------------------- 2023-10-13 22:43:08,850 Loading model from best epoch ... 2023-10-13 22:43:10,240 SequenceTagger predicts: Dictionary with 13 tags: O, S-PER, B-PER, E-PER, I-PER, S-LOC, B-LOC, E-LOC, I-LOC, S-ORG, B-ORG, E-ORG, I-ORG 2023-10-13 22:43:13,645 Results: - F-score (micro) 0.7805 - F-score (macro) 0.6917 - Accuracy 0.6599 By class: precision recall f1-score support LOC 0.8191 0.8641 0.8410 655 PER 0.7171 0.8296 0.7692 223 ORG 0.4912 0.4409 0.4647 127 micro avg 0.7592 0.8030 0.7805 1005 macro avg 0.6758 0.7116 0.6917 1005 weighted avg 0.7550 0.8030 0.7775 1005 2023-10-13 22:43:13,646 ----------------------------------------------------------------------------------------------------