Update README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,19 @@
|
|
1 |
# AraT5-base
|
2 |
-
|
3 |
|
4 |
-
|
5 |
-
](https://arxiv.org/abs/2109.12068). In this paper, we introduce three powerful Arabic-specific text-to-text transformer models trained on large Modern Standard Arabic (MSA) and/or Dialectal Arabic (DA) data. **AraT5** is trained on 248GB of text (29B tokens) of MSA and DA, **AraT5-msa** is trained on 70GB of text (7.1B tokens) from MSA data, and **AraT5-tweet** is trained on 178Gb of text (21.9B tokens) from 1.5B Arabic tweets which contains multiple varieties of dialectical Arabic.
|
6 |
|
7 |
-
|
|
|
|
|
|
|
8 |
|
|
|
|
|
9 |
|
|
|
|
|
|
|
10 |
# How to use AraT5 models
|
11 |
Below is an example for fine-tuning **AraT5-base** for News Title Generation on the Aranews dataset
|
12 |
``` bash
|
|
|
1 |
# AraT5-base
|
2 |
+
# AraT5: Text-to-Text Transformers for Arabic Language Generation
|
3 |
|
4 |
+
<img src="AraT5_CR_new.png" alt="AraT5" width="55%" height="45%" align="right"/>
|
|
|
5 |
|
6 |
+
This is the repository accompanying our paper [AraT5: Text-to-Text Transformers for Arabic Language Understanding and Generation](https://arxiv.org/abs/2109.12068). In this is the repository we introduce:
|
7 |
+
* Introduce **AraT5<sub>MSA</sub>**, **AraT5<sub>Tweet</sub>**, and **AraT5**: three powerful Arabic-specific text-to-text Transformer based models;
|
8 |
+
* Introduce **ARGEN**: A new benchmark for Arabic language generation and evaluation for four Arabic NLP tasks, namely, ```machine translation```, ```summarization```, ```news title generation```, ```question generation```, , ```paraphrasing```, ```transliteration```, and ```code-switched translation```.
|
9 |
+
* Evaluate ```AraT5``` models on ```ARGEN``` and compare against available language models.
|
10 |
|
11 |
+
Our models establish new state-of-the-art (SOTA) on several publicly available datasets.
|
12 |
+
Our language models are publicaly available for research (see below).
|
13 |
|
14 |
+
The rest of this repository provides more information about our new language models, benchmark, and experiments.
|
15 |
+
|
16 |
+
---
|
17 |
# How to use AraT5 models
|
18 |
Below is an example for fine-tuning **AraT5-base** for News Title Generation on the Aranews dataset
|
19 |
``` bash
|