metadata

license: mit
tags:
  - generated_from_trainer
metrics:
  - precision
  - recall
  - f1
  - accuracy
model_index:
  - name: bert-portuguese-ner-archive
    results:
      - task:
          name: Token Classification
          type: token-classification
        metric:
          name: Accuracy
          type: accuracy
          value: 0.9700325118974698

bert-portuguese-ner-archive

This model is a fine-tuned version of neuralmind/bert-base-portuguese-cased It achieves the following results on the evaluation set:

Loss: 0.1140
Precision: 0.9147
Recall: 0.9483
F1: 0.9312
Accuracy: 0.9700

Model description

This model was fine-tunned on token classification task (NER) on Portuguese archival documents. The annotated labels are: Date, Profession, Person, Place, Organization

Datasets

All the training and evaluation data is available at: http://ner.epl.di.uminho.pt/

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 4

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
No log	1.0	192	0.1438	0.8917	0.9392	0.9148	0.9633
0.2454	2.0	384	0.1222	0.8985	0.9417	0.9196	0.9671
0.0526	3.0	576	0.1098	0.9150	0.9481	0.9312	0.9698
0.0372	4.0	768	0.1140	0.9147	0.9483	0.9312	0.9700

Framework versions

Transformers 4.10.0.dev0
Pytorch 1.9.0+cu111
Datasets 1.10.2
Tokenizers 0.10.3

Citation

@InProceedings{10.1007/978-3-031-04819-7_33, author="da Costa Cunha, Lu{'i}s Filipe and Ramalho, Jos{'e} Carlos", editor="Rocha, Alvaro and Adeli, Hojjat and Dzemyda, Gintautas and Moreira, Fernando", title="NER in Archival Finding Aids: Next Level", booktitle="Information Systems and Technologies", year="2022", publisher="Springer International Publishing", address="Cham", pages="333--342", isbn="978-3-031-04819-7" }