moha
/

arabert_c19

Inference Endpoints

Model card Files Files and versions Community

arabert_c19 / README.md

moha's picture

Update README.md

db758e6 almost 4 years ago

|

5.48 kB

	---
	language: ar
	widget:
	- text: "للوقايه من عدم انتشار [MASK]"
	---
	# arabert_c19: An Arabert model pretrained on 1.5 million COVID-19 multi-dialect Arabic tweets
	ARABERT COVID-19 is a pretrained (fine-tuned) version of the AraBERT v2 model (https://huggingface.co/aubmindlab/bert-base-arabertv02). The pretraining was done using 1.5 million multi-dialect Arabic tweets regarding the COVID-19 pandemic from the “Large Arabic Twitter Dataset on COVID-19” (https://arxiv.org/abs/2004.04315).
	The model can achieve better results for the tasks that deal with multi-dialect Arabic tweets in relation to the COVID-19 pandemic.

	# Classification results for multiple fake-news detection tasks with and without using the arabert_c19:
	For more details refer to the paper (link)

	\| \| \multicolumn{5}{c}{Without Fine-tuning} \| \multicolumn{5}{c}{With Fine-tuning} \|
	\|------------------------------------\|-----------------------------------------\|------------------------------------------------\|
	\| \| \multicolumn{3}{c}{Baseline models} \| \multicolumn{2}{c}{Pretrained Covid-19 models} \| \multicolumn{3}{c}{Baseline models} \| \multicolumn{2}{c}{Pretrained Covid-19 models} \|
	\| \| arabert \| mbert \| distilbert-multi \| \textbf{arabert Cov19} \| \textbf{mbert Cov19} \| arabert \| mbert \| distilbert-multi \| \textbf{arabert Cov19} \| \textbf{mbert Cov19} \|
	\| \textbf{Contains hate} \| 0.8346 \| 0.6675 \| 0.7145 \| \textbf{0.8649} \| 0.8492 \| 0.9809 \| 0.97 \| 0.9736 \| \textbf{0.9858} \| 0.9809 \|
	\| \textbf{Talk about a cure} \| 0.8193 \| 0.7406 \| 0.7127 \| 0.9055 \| \textbf{0.9176} \| 0.99 \| 0.9854 \| 0.9774 \| \textbf{0.9930} \| 0.9904 \|
	\| \textbf{Give advice} \| 0.8287 \| 0.6865 \| 0.6974 \| \textbf{0.9035} \| 0.8948 \| 0.9793 \| 0.9664 \| 0.9764 \| 0.9824 \| \textbf{0.9862} \|
	\| \textbf{Rise moral } \| 0.8398 \| 0.7075 \| 0.7049 \| \textbf{0.8903} \| 0.8838 \| 0.9618 \| 0.9663 \| 0.9618 \| 0.97 \| \textbf{0.9712} \|
	\| \textbf{News or opinion } \| 0.8987 \| 0.8332 \| 0.8099 \| \textbf{0.9163} \| 0.9116 \| 0.9552 \| 0.9409 \| 0.9529 \| \textbf{0.9627} \| 0.9594 \|
	\| \textbf{Dialect} \| 0.7533 \| 0.558 \| 0.5433 \| \textbf{0.8230} \| 0.7682 \| 0.9266 \| 0.9137 \| 0.9102 \| 0.9281 \| \textbf{0.9317} \|
	\| \textbf{Blame and negative speech} \| 0.7426 \| 0.597 \| 0.6221 \| \textbf{0.7997} \| 0.7794 \| 0.9607 \| 0.9476 \| 0.9587 \| \textbf{0.9653} \| 0.9633 \|
	\| \textbf{Factual} \| 0.9217 \| 0.8427 \| 0.8383 \| 0.9575 \| \textbf{0.9608} \| 0.9958 \| 0.9917 \| 0.9925 \| 0.995 \| \textbf{0.9967} \|
	\| \textbf{Worth fact-checking} \| 0.7731 \| 0.5298 \| 0.5413 \| 0.8265 \| \textbf{0.8383} \| 0.9885 \| 0.9824 \| 0.9763 \| \textbf{0.9907} \| 0.9891 \|
	\| \textbf{Contains fake information} \| 0.6415 \| 0.5428 \| 0.4743 \| \textbf{0.7739} \| 0.7228 \| 0.9417 \| 0.9353 \| 0.9288 \| \textbf{0.9578} \| 0.9491 \|



	# Preprocessing

	```python
	from arabert.preprocess import ArabertPreprocessor
	model_name="moha/arabert_c19"
	arabert_prep = ArabertPreprocessor(model_name=model_name)
	text = "للوقايه من عدم انتشار كورونا عليك اولا غسل اليدين بالماء والصابون وتكون عملية الغسل دقيقه تشمل راحة اليد الأصابع التركيز على الإبهام"
	arabert_prep.preprocess(text)
	```


	# Contacts
	Hadj Ameur: [Github](https://github.com/MohamedHadjAmeur) \| <[email protected]> \| <[email protected]>