language:
- ar
- fr
thumbnail: url to a thumbnail used in social sharing
tags:
- ner
- token-classification
- Arabic-NER
metrics:
- accuracy
- f1
- precision
- recall
widget:
- text: النجم محمد صلاح لاعب المنتخب المصري يعيش في مصر بالتحديد من نجريج, الشرقية
example_title: Mohamed Salah
- text: انا ساكن في حدايق الزتون و بدرس في جامعه عين شمس
example_title: Egyptian Dialect
- text: يقع نهر الأمازون في قارة أمريكا الجنوبية
example_title: Standard Arabic
datasets:
- Fine-grained-Arabic-Named-Entity-Corpora
pipeline_tag: token-classification
Arabic Named Entity Recognition
This project is made to enrich the Arabic Named Entity Recognition(ANER). Arabic is a tough language to deal with and has alot of difficulties. We managed to made a model based on Arabert to support 50 entities.
Paper
Here's the paper that contains all the details for our model, our approach, and the training results
Dataset
Evaluation results
The model achieves the following results:
Dataset | WikiFANE Gold | WikiFANE Gold | WikiFANE Gold | NewsFANE Gold | NewsFANE Gold | NewsFANE Gold |
---|---|---|---|---|---|---|
(metric) | (Recall) | (Precision) | (F1) | (Recall) | (Precision) | (F1) |
87.0 | 90.5 | 88.7 | 78.1 | 77.4 | 77.7 |
Usage
The model is available on the HuggingFace model page under the name: boda/ANER. Checkpoints are available only in PyTorch at the time.
Use in python:
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("boda/ANER")
model = AutoModelForTokenClassification.from_pretrained("boda/ANER")
Acknowledgments
Thanks to Arabert for providing the Arabic Bert model, which we used as a base model for our work.
We also would like to thank Prof. Fahd Saleh S Alotaibi at the Faculty of Computing and Information Technology King Abdulaziz University, for providing the dataset which we used to train our model with.
Contacts
Abdelrahman Atef