Spaces:
Build error
Build error
File size: 3,963 Bytes
ebe69e3 7b7fa5d 10f9383 5a2137f 10f9383 ebe69e3 9cc9f0f db6efa3 b33a613 db6efa3 9cc9f0f 93e4b9d 9cc9f0f 1b80991 ebe69e3 1b80991 ebe69e3 9cc9f0f db6efa3 17a327f 10f9383 db6efa3 ebe69e3 10f9383 b33a613 5a2137f 10f9383 b33a613 10f9383 b33a613 1b80991 10f9383 1b80991 93e4b9d 10f9383 7468f33 10f9383 1b80991 10f9383 b7dd29c b33a613 8f5b334 62e637e 8f5b334 7b7fa5d c8cfea1 3a2ce5a abfdbd4 c8cfea1 7b7fa5d a0f3313 db6efa3 7b7fa5d 17a327f 8f5b334 4607b11 7b7fa5d b7dd29c 10f9383 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
import streamlit as st
import matplotlib.pyplot as plt
import spacy
import transformers
import os
from spacy.lang.en import English
from transformers import AutoModel, AutoTokenizer
from utils.utils import *
transformers.utils.logging.disable_progress_bar()
os.system("python3 -m spacy download en")
st.markdown("""### TL;DR: give me the keywords!
Здесь вы можете получить отранжированный список ключевых слов по названию и аннотации статьи.
Единственным поддерживаемым языком является английский.""")
st.markdown("<p style=\"text-align:center\"><img width=100% src='https://c.tenor.com/IKt-6tAk9CUAAAAd/thats-a-lot-of-words-lots-of-words.gif'></p>", unsafe_allow_html=True)
#from transformers import pipeline
#pipe = pipeline("ner", "Davlan/distilbert-base-multilingual-cased-ner-hrl")
#st.markdown("#### Title:")
title = st.text_area("Заголовок:", value="How to cook a neural network", height=16, help="Заголовок статьи")
abstract = st.text_area("Аннотация:",
value="""My dad fits hellish models in general.
Well, this is about an average recipe, because there are a lot of variations.
The model is taken, it is not finetuned, finetuning is not about my dad.
He takes this model, dumps it into the tensorboard and starts frying it.
Adds a huge amount of noize, convolutions, batch and spectral normalization DROPOUT! for regularization, maxpooling on top.
All this is fitted to smoke.
Then the computer is removed from the fire and cools on the balcony.
Then dad brings it in and generously sprinkles it with crossvalidation and starts predicting.
At the same time, he gets data from the web, scraping it with a fork.
Predicts and sentences in a half-whisper oh god.
At the same time, he has sweat on his forehead.
Kindly offers me sometimes, but I refuse.
Do I need to talk about what the wildest overfitting then?
The overfitting is such that the val loss peels off the walls.
""",
height=512, help="Аннотация статьи")
# Spacy
@st.cache(hash_funcs={English: lambda _: None})
def get_nlp(nlp_name):
return spacy.load(nlp_name)
# Вообще, стоит найти pipeline, заточенный под научный текст.
# Но этим займёмся потом, если будет время.
nlp_name = 'en_core_web_sm'
main_nlp = get_nlp(nlp_name)
# Получение модели.
#@st.cache(hash_funcs={transformers.tokenizers.Tokenizer: lambda _: None})
def get_model_and_tokenizer(model_name):
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
return model, tokenizer
model_name = "distilroberta-base"
main_model, main_tokenizer = get_model_and_tokenizer(model_name)
# Обработка текста.
text = preprocess([title + ". " + abstract])[0]
if not text is None and len(text) > 0:
#keywords = get_candidates(text, main_nlp)
keywords = get_keywords(text, main_nlp, main_model, main_tokenizer)
labels = [kw[0].replace(' ', '\n') for kw in keywords]
scores = [kw[1] for kw in keywords]
#st.markdown(f"{keywords}")
# Топ 5 слов.
top = 5
top = min(len(labels), top)
st.markdown("Топ %d ключевых слов: **%s**" % (top, ', '.join(labels[0:5])))
# График важности слов.
fig, ax = plt.subplots(figsize=(8, len(labels)))
ax.set_title("95% самых важных ключевых слов")
ax.grid(color='#000000', alpha=0.15, linestyle='-', linewidth=1, which='major')
ax.grid(color='#000000', alpha=0.1, linestyle='-', linewidth=0.5, which='minor')
bar_width = 0.75
indexes = -np.arange(len(labels))
ax.barh(indexes, scores, bar_width)
plt.yticks(indexes, labels=labels)
st.pyplot(fig)
else:
st.markdown("Please, try to enter something.") |