--- license: mit base_model: google-bert/bert-base-german-cased tags: - generated_from_trainer model-index: - name: bert-mapa-german results: [] language: - de --- # bert-mapa-german This model is a fine-tuned version of [google-bert/bert-base-german-cased](https://huggingface.co/google-bert/bert-base-german-cased) on the MAPA german dataset. It's purpose is to discern private information within German texts. It achieves the following results on the test set: | Category | Precision | Recall | F1 | Number | |---------------|------------|------------|------------|--------| | Address | 0.5882 | 0.6667 | 0.625 | 15 | | Age | 0.0 | 0.0 | 0.0 | 3 | | Amount | 1.0 | 1.0 | 1.0 | 1 | | Date | 0.9455 | 0.9455 | 0.9455 | 55 | | Name | 0.7 | 0.9545 | 0.8077 | 22 | | Organisation | 0.5405 | 0.6452 | 0.5882 | 31 | | Person | 0.5385 | 0.5 | 0.5185 | 14 | | Role | 0.0 | 0.0 | 0.0 | 1 | | Overall | 0.7255 | 0.7817 | 0.7525 | | - Loss: 0.0325 - Overall Accuracy: 0.9912 ## Intended uses & limitations This model is engineered for the purpose of discerning private information within German texts. Its training corpus comprises only 1744 example sentences, thereby leading to a higher frequency of errors in its predictions. ## Training and evaluation data Random split of the MAPA german dataset into 80% train, 10% valdiation and 10% test. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 4 ### Training results | Training Loss | Epoch | Step | Validation Loss | Overall Precision | Overall Recall | Overall F1 | Overall Accuracy | |:-------------:|:-----:|:----:|:---------------:|:-----------------:|:--------------:|:----------:|:----------------:| | No log | 1.0 | 218 | 0.0607 | 0.6527 | 0.7786 | 0.7101 | 0.9859 | | No log | 2.0 | 436 | 0.0479 | 0.7355 | 0.8143 | 0.7729 | 0.9896 | | 0.116 | 3.0 | 654 | 0.0414 | 0.7712 | 0.8429 | 0.8055 | 0.9908 | | 0.116 | 4.0 | 872 | 0.0421 | 0.7857 | 0.8643 | 0.8231 | 0.9917 | ### Framework versions - Transformers 4.40.0 - Pytorch 2.1.0+cu121 - Datasets 2.19.0 - Tokenizers 0.19.1