somosnlp-hackathon-2022
/

poem-gen-spanish-t5-small

Text2Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

milyiyo commited on Mar 26, 2022

Commit

f6d0d20

·

1 Parent(s): fc91c97

Update README.md

Files changed (1) hide show

README.md +33 -3

README.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
 license: mit
 tags:
 - generated_from_trainer
 model-index:
@@ -38,13 +39,42 @@ poema:
   texto: Todos fueron a verle pasar
 ```
-## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure

 ---
 license: mit
+language: es
 tags:
 - generated_from_trainer
 model-index:
   texto: Todos fueron a verle pasar
 ```
+### How to use
+You can use this model directly with a pipeline for masked language modeling:
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+model_name = 'hackathon-pln-es/poem-gen-spanish-t5-small'
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
+author, sentiment, word, start_text = 'Pablo Neruda', 'positivo', 'cielo', 'Todos fueron a la plaza'
+input_text = f"""poema: estilo: {author} && sentimiento: {sentiment} && palabras: {word} && texto: {start_text} """
+inputs = tokenizer(input_text, return_tensors="pt")
+outputs = model.generate(inputs["input_ids"],
+                         do_sample = True,
+                         max_length = 30,
+                         repetition_penalty = 20.0,
+                         top_k = 50,
+                         top_p = 0.92)
+detok_outputs = [tokenizer.decode(x, skip_special_tokens=True) for x in outputs]
+res = detok_outputs[0]
+```
 ## Training and evaluation data
+The original dataset has the columns `author`, `content` and `title`.
+For each poem we generate new examples:
+- content: *line_i* , generated: *line_i+1*
+- content: *concatenate(line_i, line_i+1)* , generated: *line_i+2*
+- content: *concatenate(line_i, line_i+1, line_i+2)* , generated: *line_i+3*
+The resulting dataset has the columns `author`, `content`, `title` and `generated`.
+For each example we compute the sentiment of the generated column and the nouns. In the case of sentiment, we used the model `mrm8488/electricidad-small-finetuned-restaurant-sentiment-analysis` and for nouns extraction we used spaCy.
 ## Training procedure