Swahili Named Entity Recognition

  • TUS-NER-sw is a fine-tuned BERT model that is ready to use for Named Entity Recognition and achieves state-of-the-art performance 😀
  • Finetuned from model: eolang/SW-v1

Intended uses & limitations

How to use

You can use this model with Transformers pipeline for NER.

from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("eolang/SW-NER-v1")
model = AutoModelForTokenClassification.from_pretrained("eolang/SW-NER-v1")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Tumefanya mabadiliko muhimu katika sera zetu za faragha na vidakuzi"

ner_results = nlp(example)
print(ner_results)

Training data

This model was fine-tuned on the Swahili Version of the Masakhane Dataset from the MasakhaneNER Project. MasakhaNER is a collection of Named Entity Recognition (NER) datasets for 10 different African languages. The languages forming this dataset are: Amharic, Hausa, Igbo, Kinyarwanda, Luganda, Luo, Nigerian-Pidgin, Swahili, Wolof, and Yorùbá.

Training procedure

This model was trained on a single NVIDIA RTX 3090 GPU with recommended hyperparameters from the original BERT paper.

Downloads last month
126
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train eolang/SW-NER-v1