CKIP BERT Base Han Chinese POS

This model provides part-of-speech (POS) tagging for the ancient Chinese language. Our training dataset covers four eras of the Chinese language.

Homepage

Training Datasets

The copyright of the datasets belongs to the Institute of Linguistics, Academia Sinica.

Contributors

  • Chin-Tung Lin at CKIP

Usage

  • Using our model in your script

    from transformers import (
      AutoTokenizer,
      AutoModel,
    )
    
    tokenizer = AutoTokenizer.from_pretrained("ckiplab/bert-base-han-chinese-pos")
    model = AutoModel.from_pretrained("ckiplab/bert-base-han-chinese-pos")
    
  • Using our model for inference

    >>> from transformers import pipeline
    >>> classifier = pipeline("token-classification", model="ckiplab/bert-base-han-chinese-pos")
    >>> classifier("帝堯曰放勳")
    
    [{'entity': 'NB1',
      'score': 0.99410427,
      'index': 1,
      'word': '帝',
      'start': 0,
      'end': 1},
     {'entity': 'NB1',
      'score': 0.98874336,
      'index': 2,
      'word': '堯',
      'start': 1,
      'end': 2},
     {'entity': 'VG',
      'score': 0.97059363,
      'index': 3,
      'word': '曰',
      'start': 2,
      'end': 3},
     {'entity': 'NB1',
      'score': 0.9864504,
      'index': 4,
      'word': '放',
      'start': 3,
      'end': 4},
     {'entity': 'NB1',
      'score': 0.9543974,
      'index': 5,
      'word': '勳',
      'start': 4,
      'end': 5}]
    
Downloads last month
5
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.