MorrBERT
MorrBERT is a Transformer-based Language Model designed specifically for the Moroccan Dialect. Developed by Moussaoui Otman and El Younoussi Yacine.
About MorrBERT
MorrBERT, specifically tailored for the Moroccan dialect, is structured identically to BERTBASE. The training process took approximately 120 hours to complete 12 epochs using the entire training set. A massive corpus of six million Moroccan dialect sentences, totaling 71 billion tokens, was utilized to train this model.
For more information, please visit our paper: https://mendel-journal.org/index.php/mendel/article/view/223/200
Usage
The model weights can be loaded using transformers library by HuggingFace.
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("otmangi/MorrBERT")
model = AutoModel.from_pretrained("otmangi/MorrBERT")
Acknowledgments
This research was supported through computational resources of HPC-MARWAN (www.marwan.ma/hpc) provided by the National Center for Scientific and Technical Research (CNRST). Rabat. Morocco.
How to cite
@article{Moussaoui_El Younnoussi_2023, title={Pre-training Two BERT-Like Models for Moroccan Dialect: MorRoBERTa and MorrBERT}, volume={29}, url={https://mendel-journal.org/index.php/mendel/article/view/223}, DOI={10.13164/mendel.2023.1.055}, number={1}, journal={MENDEL}, author={Moussaoui, Otman and El Younnoussi, Yacine}, year={2023}, month={Jun.}, pages={55-61} }
Contact
For any inquiries, feedback, or requests, please feel free to reach out to :
- Downloads last month
- 88