Commit
·
89e59b0
1
Parent(s):
a2387e4
added citation
Browse files
README.md
CHANGED
@@ -65,7 +65,7 @@ Follow the guide linked [here](https://towardsdatascience.com/fine-tuning-gpt2-o
|
|
65 |
|
66 |
## Finetuning using our code with TF 1.15.4:
|
67 |
|
68 |
-
|
69 |
```bash
|
70 |
python create_pretraining_data.py
|
71 |
--input_file=<RAW TEXT FILE with documents/article sperated by an empty line>
|
@@ -73,7 +73,7 @@ python create_pretraining_data.py
|
|
73 |
--tokenizer_dir=<Directory with the GPT2 Tokenizer files>
|
74 |
```
|
75 |
|
76 |
-
|
77 |
```bash
|
78 |
python3 run_pretraining.py \
|
79 |
--input_file="gs://<GS_BUCKET>/pretraining_data/*" \
|
@@ -137,13 +137,18 @@ For the new dataset we added the unshuffled OSCAR corpus, after we thoroughly fi
|
|
137 |
# If you used this model please cite us as :
|
138 |
|
139 |
```
|
140 |
-
@
|
141 |
-
|
142 |
-
|
143 |
-
|
144 |
-
|
145 |
-
|
146 |
-
|
|
|
|
|
|
|
|
|
|
|
147 |
}
|
148 |
```
|
149 |
|
|
|
65 |
|
66 |
## Finetuning using our code with TF 1.15.4:
|
67 |
|
68 |
+
Create the Training TFRecords:
|
69 |
```bash
|
70 |
python create_pretraining_data.py
|
71 |
--input_file=<RAW TEXT FILE with documents/article sperated by an empty line>
|
|
|
73 |
--tokenizer_dir=<Directory with the GPT2 Tokenizer files>
|
74 |
```
|
75 |
|
76 |
+
Finetuning:
|
77 |
```bash
|
78 |
python3 run_pretraining.py \
|
79 |
--input_file="gs://<GS_BUCKET>/pretraining_data/*" \
|
|
|
137 |
# If you used this model please cite us as :
|
138 |
|
139 |
```
|
140 |
+
@inproceedings{antoun-etal-2021-aragpt2,
|
141 |
+
title = "{A}ra{GPT}2: Pre-Trained Transformer for {A}rabic Language Generation",
|
142 |
+
author = "Antoun, Wissam and
|
143 |
+
Baly, Fady and
|
144 |
+
Hajj, Hazem",
|
145 |
+
booktitle = "Proceedings of the Sixth Arabic Natural Language Processing Workshop",
|
146 |
+
month = apr,
|
147 |
+
year = "2021",
|
148 |
+
address = "Kyiv, Ukraine (Virtual)",
|
149 |
+
publisher = "Association for Computational Linguistics",
|
150 |
+
url = "https://www.aclweb.org/anthology/2021.wanlp-1.21",
|
151 |
+
pages = "196--207",
|
152 |
}
|
153 |
```
|
154 |
|