CKIP BERT Base Han Chinese POS

This model provides part-of-speech (POS) tagging for the ancient Chinese language. Our training dataset covers four eras of the Chinese language.

Homepage

ckiplab/han-transformers

Training Datasets

The copyright of the datasets belongs to the Institute of Linguistics, Academia Sinica.

Contributors

Chin-Tung Lin at CKIP

Usage

Using our model in your script

from transformers import (
  AutoTokenizer,
  AutoModel,
)

tokenizer = AutoTokenizer.from_pretrained("ckiplab/bert-base-han-chinese-pos")
model = AutoModel.from_pretrained("ckiplab/bert-base-han-chinese-pos")

Using our model for inference

>>> from transformers import pipeline
>>> classifier = pipeline("token-classification", model="ckiplab/bert-base-han-chinese-pos")
>>> classifier("帝堯曰放勳")

[{'entity': 'NB1',
  'score': 0.99410427,
  'index': 1,
  'word': '帝',
  'start': 0,
  'end': 1},
 {'entity': 'NB1',
  'score': 0.98874336,
  'index': 2,
  'word': '堯',
  'start': 1,
  'end': 2},
 {'entity': 'VG',
  'score': 0.97059363,
  'index': 3,
  'word': '曰',
  'start': 2,
  'end': 3},
 {'entity': 'NB1',
  'score': 0.9864504,
  'index': 4,
  'word': '放',
  'start': 3,
  'end': 4},
 {'entity': 'NB1',
  'score': 0.9543974,
  'index': 5,
  'word': '勳',
  'start': 4,
  'end': 5}]