File size: 1,701 Bytes
08e4870
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49075f7
 
 
d825853
 
 
 
 
 
 
 
 
49075f7
08e4870
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2e4e1da
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
language: "ar"
tags:
- text-generation
datasets:
- APCD
widget:
 - text: "."
 - text: "عيد بأية حال"
 - text: "يا قدس"
 - text: "يا قدس"
 - text: "ألا ليت"
---

# GPT2-Arabic-Poetry-2023

## Model description

Fine-tuned model of Arabic poetry dataset based on aragpt2-medium.

## Intended uses & limitations

#### How to use

Try this [HF Space](https://huggingface.co/spaces/akhooli/poetry).  
From script: 
```
from transformers import pipeline
pipe = pipeline('text-generation', framework='pt', device=-1, model='akhooli/ap2023', tokenizer='akhooli/ap2023')
gen = pipe(prompt, max_length=96, temperature = 0.95,repetition_penalty=1.05,
               num_beams=3, num_return_sequences=2, do_sample = True, 
               top_p = 1.0, top_k = 50, return_full_text=True)[0]["generated_text"]
poetry =""
for line in gen.split('.')[:-1]:
        poetry += line 
print(poetry)
```

#### Limitations and bias

Both the GPT2-small-arabic (trained on Arabic Wikipedia) and this model have several limitations in terms of coverage and training performance. 
Use them as demonstrations or proof of concepts but not as production code.

## Training data

This pretrained model used poems from several eras with a total of around 1.4M lines (1.25M used for training). 
The dataset was trained (fine-tuned) based on the [aragpt2-medium](https://huggingface.co/aubmindlab/aragpt2-medium) transformer model.

## Training procedure

Training was done using HF Trainer using free GPU on Kaggle.

## Eval results 
Final perplexity reached was 52, eval_accuracy = 0.3704, eval_loss = 3.9513

### BibTeX entry and citation info

```bibtex
@inproceedings{Abed Khooli,
  year={2023}
}
```