Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ widget:
|
|
4 |
- text: "للوقايه من انتشار [MASK]"
|
5 |
---
|
6 |
# arabert_c19: An Arabert model pretrained on 1.5 million COVID-19 multi-dialect Arabic tweets
|
7 |
-
**mBERT COVID-19** is a pretrained (fine-tuned) version of the mBERT model (https://huggingface.co/bert-base-multilingual-cased). The pretraining was done using 1.5 million multi-dialect Arabic tweets regarding the COVID-19 pandemic from the “Large Arabic Twitter Dataset on COVID-19” (https://arxiv.org/abs/2004.04315).
|
8 |
The model can achieve better results for the tasks that deal with multi-dialect Arabic tweets in relation to the COVID-19 pandemic.
|
9 |
|
10 |
# Classification results for multiple tasks including fake-news and hate speech detection when using arabert_c19 and mbert_ar_c19:
|
@@ -25,5 +25,21 @@ arabert_prep = ArabertPreprocessor(model_name=model_name)
|
|
25 |
text = "للوقايه من عدم انتشار كورونا عليك اولا غسل اليدين بالماء والصابون وتكون عملية الغسل دقيقه تشمل راحة اليد الأصابع التركيز على الإبهام"
|
26 |
arabert_prep.preprocess(text)
|
27 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
# Contacts
|
29 |
**Hadj Ameur**: [Github](https://github.com/MohamedHadjAmeur) | <[email protected]> | <[email protected]>
|
|
|
4 |
- text: "للوقايه من انتشار [MASK]"
|
5 |
---
|
6 |
# arabert_c19: An Arabert model pretrained on 1.5 million COVID-19 multi-dialect Arabic tweets
|
7 |
+
**mBERT COVID-19** [Arxiv URL](https://arxiv.org/pdf/2105.03143.pdf) is a pretrained (fine-tuned) version of the mBERT model (https://huggingface.co/bert-base-multilingual-cased). The pretraining was done using 1.5 million multi-dialect Arabic tweets regarding the COVID-19 pandemic from the “Large Arabic Twitter Dataset on COVID-19” (https://arxiv.org/abs/2004.04315).
|
8 |
The model can achieve better results for the tasks that deal with multi-dialect Arabic tweets in relation to the COVID-19 pandemic.
|
9 |
|
10 |
# Classification results for multiple tasks including fake-news and hate speech detection when using arabert_c19 and mbert_ar_c19:
|
|
|
25 |
text = "للوقايه من عدم انتشار كورونا عليك اولا غسل اليدين بالماء والصابون وتكون عملية الغسل دقيقه تشمل راحة اليد الأصابع التركيز على الإبهام"
|
26 |
arabert_prep.preprocess(text)
|
27 |
```
|
28 |
+
|
29 |
+
# Citation
|
30 |
+
|
31 |
+
Please cite as:
|
32 |
+
|
33 |
+
``` bibtex
|
34 |
+
@misc{ameur2021aracovid19mfh,
|
35 |
+
title={AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech Detection Dataset},
|
36 |
+
author={Mohamed Seghir Hadj Ameur and Hassina Aliane},
|
37 |
+
year={2021},
|
38 |
+
eprint={2105.03143},
|
39 |
+
archivePrefix={arXiv},
|
40 |
+
primaryClass={cs.CL}
|
41 |
+
}
|
42 |
+
```
|
43 |
+
|
44 |
# Contacts
|
45 |
**Hadj Ameur**: [Github](https://github.com/MohamedHadjAmeur) | <[email protected]> | <[email protected]>
|