monsoon-nlp
commited on
Commit
·
d5bd443
1
Parent(s):
0c4cd12
basic example
Browse files
README.md
CHANGED
@@ -14,6 +14,27 @@ Intended Examples:
|
|
14 |
|
15 |
People's names, gender pronouns, gendered words (father, mother), and many other values are currently unchanged by this model. Future versions may be trained on more data.
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
## Training
|
18 |
|
19 |
I originally developed
|
|
|
14 |
|
15 |
People's names, gender pronouns, gendered words (father, mother), and many other values are currently unchanged by this model. Future versions may be trained on more data.
|
16 |
|
17 |
+
## Sample Code
|
18 |
+
|
19 |
+
```
|
20 |
+
import torch
|
21 |
+
from transformers import AutoTokenizer, EncoderDecoderModel
|
22 |
+
|
23 |
+
model = EncoderDecoderModel.from_encoder_decoder_pretrained(
|
24 |
+
"monsoon-nlp/ar-seq2seq-gender-encoder",
|
25 |
+
"monsoon-nlp/ar-seq2seq-gender-decoder",
|
26 |
+
min_length=40
|
27 |
+
)
|
28 |
+
tokenizer = AutoTokenizer.from_pretrained('monsoon-nlp/ar-seq2seq-gender-decoder') # same as MARBERT original
|
29 |
+
|
30 |
+
input_ids = torch.tensor(tokenizer.encode("أنا سعيدة")).unsqueeze(0)
|
31 |
+
generated = model.generate(input_ids, decoder_start_token_id=model.config.decoder.pad_token_id)
|
32 |
+
tokenizer.decode(generated.tolist()[0][1 : len(input_ids[0]) - 1])
|
33 |
+
> 'انا سعيد'
|
34 |
+
```
|
35 |
+
|
36 |
+
https://colab.research.google.com/drive/1S0kE_2WiV82JkqKik_sBW-0TUtzUVmrV?usp=sharing
|
37 |
+
|
38 |
## Training
|
39 |
|
40 |
I originally developed
|