metadata

base_model: klue/roberta-small
tags:
  - generated_from_trainer
  - korean
  - klue
widget:
  - text: >-
      저는 서울특별시 강남대로에 삽니다. 전화번호는 010-1234-5678이고 주민등록번호는 123456-1234567입니다. 메일주소는
      [email protected]입니다.
metrics:
  - precision
  - recall
  - f1
  - accuracy
model-index:
  - name: klue_roberta_small_ner_identified
    results: []
language:
  - ko
pipeline_tag: token-classification

klue_roberta_small_ner_identified

This model is a fine-tuned version of klue/roberta-small on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0212
Precision: 0.9803
Recall: 1.0
F1: 0.9901
Accuracy: 0.9980

Model description

아래 항목에 대한 개체명 인식을 제공합니다.

사람이름 [PS] - 낮은 인식률
주소 (구 주소 및 도로명 주소) [AD]
카드번호 [CN]
계좌번호 [BN]
운전면허번호 [DN]
주민등록번호 [RN]
여권번호 [PN]
전화번호 [PH]
이메일 주소 [EM]

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 8

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
No log	1.0	15	0.2866	0.1199	0.2739	0.1668	0.9287
No log	2.0	30	0.1369	0.6599	0.7996	0.7231	0.9654
No log	3.0	45	0.0629	0.8088	0.9042	0.8538	0.9915
No log	4.0	60	0.0381	0.9760	0.9978	0.9868	0.9969
No log	5.0	75	0.0276	0.9781	0.9955	0.9868	0.9981
No log	6.0	90	0.0238	0.9803	1.0	0.9901	0.9979
No log	7.0	105	0.0224	0.9803	1.0	0.9901	0.9979
No log	8.0	120	0.0212	0.9803	1.0	0.9901	0.9980

Framework versions

Transformers 4.40.2
Pytorch 2.3.0+cu118
Datasets 2.19.1
Tokenizers 0.19.1

Use

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("vitus9988/klue-roberta-small-ner-identified")
model = AutoModelForTokenClassification.from_pretrained("vitus9988/klue-roberta-small-ner-identified")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
example = """
저는 서울특별시 강남대로 56길 100호에 삽니다. 전화번호는 010-1234-5678이고 주민등록번호는 123456-1234567입니다. 메일주소는 [email protected]입니다.
"""

ner_results = nlp(example)
for i in ner_results:
    print(i)

#{'entity_group': 'AD', 'score': 0.79996574, 'word': '서울특별시 강남대로 56길 100호', 'start': 4, 'end': 23}
#{'entity_group': 'PH', 'score': 0.948794, 'word': '010 - 1234 - 5678', 'start': 36, 'end': 49}
#{'entity_group': 'RN', 'score': 0.90686846, 'word': '123456 - 1234567', 'start': 60, 'end': 74}
#{'entity_group': 'EM', 'score': 0.935588, 'word': 'hugging @ face. com', 'start': 85, 'end': 101}