Iker commited on
Commit
975d075
·
verified ·
1 Parent(s): 7e155bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -9
README.md CHANGED
@@ -9,6 +9,7 @@ metrics:
9
  - rouge
10
  library_name: transformers
11
  pipeline_tag: text-generation
 
12
  tags:
13
  - clickbait
14
  - noticia
@@ -68,12 +69,28 @@ If you are looking for a smaller model, check out [ClickbaitFighter-2B](https://
68
  - 🔌 Online Demo: [https://iker-clickbaitfighter.hf.space/](https://iker-clickbaitfighter.hf.space/)
69
 
70
 
 
 
 
 
 
 
 
 
71
  # Usage example:
 
 
72
  ```python
73
  import torch # pip install torch
74
- from datasets import load_dataset # pip install datasets
75
  from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig # pip install transformers
76
 
 
 
 
 
 
 
77
 
78
  def prompt(
79
  headline: str,
@@ -107,10 +124,81 @@ def prompt(
107
  f"{body}\n"
108
  )
109
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
  dataset = load_dataset("Iker/NoticIA")
111
  example = dataset["test"][0]
 
 
112
 
113
- prompt = prompt(headline=example["web_headline"], body=example["web_text"])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
 
115
  tokenizer = AutoTokenizer.from_pretrained("Iker/ClickbaitFighter-10B")
116
  model = AutoModelForCausalLM.from_pretrained(
@@ -140,13 +228,6 @@ summary = tokenizer.batch_decode(model_output,skip_special_tokens=True)[0]
140
  print(summary.strip().split("\n")[-1]) # Get only the summary, without the prompt.
141
  ```
142
 
143
- # Evaluation Results
144
- <table>
145
- <tr>
146
- <td style="width:100%"><img src="https://github.com/ikergarcia1996/NoticIA/raw/main/results/Results.png" align="right" width="100%"> </td>
147
- </tr>
148
- </table>
149
-
150
 
151
  # Citation
152
 
 
9
  - rouge
10
  library_name: transformers
11
  pipeline_tag: text-generation
12
+ base_model: NousResearch/Nous-Hermes-2-SOLAR-10.7B
13
  tags:
14
  - clickbait
15
  - noticia
 
69
  - 🔌 Online Demo: [https://iker-clickbaitfighter.hf.space/](https://iker-clickbaitfighter.hf.space/)
70
 
71
 
72
+ # Evaluation Results
73
+ <table>
74
+ <tr>
75
+ <td style="width:100%"><img src="https://github.com/ikergarcia1996/NoticIA/raw/main/results/Results.png" align="right" width="100%"> </td>
76
+ </tr>
77
+ </table>
78
+
79
+
80
  # Usage example:
81
+
82
+ ## Summarize a web article
83
  ```python
84
  import torch # pip install torch
85
+ from newspaper import Article #pip3 install newspaper3k
86
  from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig # pip install transformers
87
 
88
+ article_url ="https://www.huffingtonpost.es/virales/le-compra-abrigo-abuela-97nos-reaccion-fantasia.html"
89
+ article = Article(article_url)
90
+ article.download()
91
+ article.parse()
92
+ headline=article.title
93
+ body = article.text
94
 
95
  def prompt(
96
  headline: str,
 
124
  f"{body}\n"
125
  )
126
 
127
+ prompt = prompt(headline=headline, body=body)
128
+
129
+ tokenizer = AutoTokenizer.from_pretrained("Iker/ClickbaitFighter-10B")
130
+ model = AutoModelForCausalLM.from_pretrained(
131
+ "Iker/ClickbaitFighter-2B", torch_dtype=torch.bfloat16, device_map="auto"
132
+ )
133
+
134
+ formatted_prompt = tokenizer.apply_chat_template(
135
+ [{"role": "user", "content": prompt}],
136
+ tokenize=False,
137
+ add_generation_prompt=True,
138
+ )
139
+
140
+ model_inputs = tokenizer(
141
+ [formatted_prompt], return_tensors="pt", add_special_tokens=False
142
+ )
143
+
144
+ model_output = model.generate(**model_inputs.to(model.device), generation_config=GenerationConfig(
145
+ max_new_tokens=32,
146
+ min_new_tokens=1,
147
+ do_sample=False,
148
+ num_beams=1,
149
+ use_cache=True
150
+ ))
151
+
152
+ summary = tokenizer.batch_decode(model_output,skip_special_tokens=True)[0]
153
+
154
+ print(summary.strip().split("\n")[-1]) # Get only the summary, without the prompt.
155
+ ```
156
+
157
+ ## Run inference in the NoticIA dataset
158
+ ```python
159
+ import torch # pip install torch
160
+ from newspaper import Article #pip3 install newspaper3k
161
+ from datasets import load_dataset # pip install datasets
162
+ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig # pip install transformers
163
+
164
  dataset = load_dataset("Iker/NoticIA")
165
  example = dataset["test"][0]
166
+ headline = example["web_headline"]
167
+ body = example["web_text"]
168
 
169
+ def prompt(
170
+ headline: str,
171
+ body: str,
172
+ ) -> str:
173
+ """
174
+ Generate the prompt for the model.
175
+
176
+ Args:
177
+ headline (`str`):
178
+ The headline of the article.
179
+ body (`str`):
180
+ The body of the article.
181
+ Returns:
182
+ `str`: The formatted prompt.
183
+ """
184
+
185
+ return (
186
+ f"Ahora eres una Inteligencia Artificial experta en desmontar titulares sensacionalistas o clickbait. "
187
+ f"Tu tarea consiste en analizar noticias con titulares sensacionalistas y "
188
+ f"generar un resumen de una sola frase que revele la verdad detrás del titular.\n"
189
+ f"Este es el titular de la noticia: {headline}\n"
190
+ f"El titular plantea una pregunta o proporciona información incompleta. "
191
+ f"Debes buscar en el cuerpo de la noticia una frase que responda lo que se sugiere en el título. "
192
+ f"Siempre que puedas cita el texto original, especialmente si se trata de una frase que alguien ha dicho. "
193
+ f"Si citas una frase que alguien ha dicho, usa comillas para indicar que es una cita. "
194
+ f"Usa siempre las mínimas palabras posibles. No es necesario que la respuesta sea una oración completa. "
195
+ f"Puede ser sólo el foco de la pregunta. "
196
+ f"Recuerda responder siempre en Español.\n"
197
+ f"Este es el cuerpo de la noticia:\n"
198
+ f"{body}\n"
199
+ )
200
+
201
+ prompt = prompt(headline=headline, body=body)
202
 
203
  tokenizer = AutoTokenizer.from_pretrained("Iker/ClickbaitFighter-10B")
204
  model = AutoModelForCausalLM.from_pretrained(
 
228
  print(summary.strip().split("\n")[-1]) # Get only the summary, without the prompt.
229
  ```
230
 
 
 
 
 
 
 
 
231
 
232
  # Citation
233