Spaces:
Runtime error
Runtime error
File size: 11,206 Bytes
6ae27e8 75efc41 a41bdbc 75c3a89 75efc41 6ae27e8 5d6dd83 6e03e5d 0d51b77 75efc41 6ae27e8 205f298 6b6585e 205f298 e01d8a9 6a49bc1 6ae27e8 0d51b77 113ad6b 0d51b77 5f56dc3 0d51b77 e01d8a9 09bdfeb e01d8a9 6ae27e8 0d51b77 6ae27e8 0d51b77 15f2759 0d51b77 f7a5664 6b6585e 6ae27e8 6e03e5d 0be3a1a 6e03e5d 932c825 6e03e5d 6ae27e8 6e03e5d 932c825 6e03e5d 6ae27e8 6e03e5d 932c825 31f3439 932c825 6ae27e8 6e03e5d 6ae27e8 6e03e5d 5cd1ac6 6e03e5d a41bdbc f7a5664 6e03e5d 75efc41 6ae27e8 6e03e5d 31f3439 5cd1ac6 f7a5664 6b6585e f7a5664 5cd1ac6 f7a5664 5cd1ac6 7b04075 5cd1ac6 f7a5664 5cd1ac6 f7a5664 5cd1ac6 75efc41 5cd1ac6 21c260a 75c3a89 6b6585e 75c3a89 6b6585e 75c3a89 6e03e5d 75c3a89 883e41e 6b6585e f18ec1c 75efc41 15f2759 75efc41 15f2759 8518d75 75efc41 6b6585e 75efc41 15f2759 75efc41 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
import streamlit as st
import pandas as pd
import torch
from backend import inference
from backend.config import MODELS_ID, QA_MODELS_ID, SEARCH_MODELS_ID
from backend.utils import load_gender_data
st.title('Demo using Flax-Sentence-Tranformers')
st.sidebar.image("./hf-sbert.jpg")
st.sidebar.title('Tasks')
menu = st.sidebar.radio("", options=["Contributions & Evaluation", "Sentence Similarity", "Asymmetric QA", "Search / Cluster",
"Gender Bias Evaluation"], index=0)
st.markdown('''
Hi! This is the demo for the [flax sentence embeddings](https://huggingface.co/flax-sentence-embeddings) created for the **Flax/JAX community week 🤗**.
We trained three general-purpose flax-sentence-embeddings models: a distilroberta base, a mpnet base and a minilm-l6. They were
trained using **Siamese network** configuration. The models were trained on a dataset comprising of
[1 Billion+ training corpus](https://huggingface.co/flax-sentence-embeddings/all_datasets_v4_MiniLM-L6#training-data) with the v3 setup.
In addition, we trained [20 models](https://huggingface.co/flax-sentence-embeddings) focused on general-purpose, QuestionAnswering and Code search and **achieved SOTA on multiple benchmarks.**
We also uploaded [8 datasets](https://huggingface.co/flax-sentence-embeddings) specialized for Question Answering, Sentence-Similiarity and Gender Evaluation.
You can view our models and datasets [here](https://huggingface.co/flax-sentence-embeddings).
''')
if menu == "Contributions & Evaluation":
st.markdown('''
## Contributions
- **20 Sentence Embedding models** that can be utilized for Sentence Simliarity / Asymmetric QA / Search & Clustering.
- **8 Datasets** from Stackexchange and StackOverflow, PAWS, Gender Evaluation uploaded to HuggingFace Hub.
- **Achieve SOTA** on multiple general purpose Sentence Similarity evaluation tasks by utilizing large TPU memory to maximize
customized Contrastive Loss. [Full Evaluation here](https://docs.google.com/spreadsheets/d/1vXJrIg38cEaKjOG5y4I4PQwAQFUmCkohbViJ9zj_Emg/edit#gid=1809754143).
- **Gender Bias demonstration** that explores inherent bias in general purpose datasets.
- **Search / Clustering demonstration** that showcases real-world use-cases for Sentence Embeddings.
## Model Evaluations
| Model | [FullEvaluation](https://docs.google.com/spreadsheets/d/1vXJrIg38cEaKjOG5y4I4PQwAQFUmCkohbViJ9zj_Emg/edit#gid=1809754143) Average | 20Newsgroups Clustering | StackOverflow DupQuestions | Twitter SemEval2015 |
|-----------|---------------------------------------|-------|-------|-------|
| paraphrase-mpnet-base-v2 (previous SOTA) | 67.97 | 47.79 | 49.03 | 72.36 |
| **all_datasets_v3_roberta-large (400k steps)** | **70.22** | 50.12 | 52.18 | 75.28 |
| **all_datasets_v3_mpnet-base (440k steps)** | **70.01** | 50.22 | 52.24 | 76.27 |
''')
elif menu == "Sentence Similarity":
st.header('Sentence Similarity')
st.markdown('''
**Instructions**: You can compare the similarity of the main text with other texts of your choice. In the background,
we'll create an embedding for each text, and then we'll use the cosine similarity function to calculate a similarity
metric between our main sentence and the others.
For more cool information on sentence embeddings, see the [sBert project](https://www.sbert.net/examples/applications/computing-embeddings/README.html).
''')
select_models = st.multiselect("Choose models", options=list(MODELS_ID), default=list(MODELS_ID)[0])
anchor = st.text_input(
'Please enter here the main text you want to compare:',
value="That is a happy person"
)
n_texts = st.number_input(
f'''How many texts you want to compare with: '{anchor}'?''',
value=3,
min_value=2)
inputs = []
defaults = ["That is a happy dog", "That is a very happy person", "Today is a sunny day"]
for i in range(int(n_texts)):
input = st.text_input(f'Text {i + 1}:', value=defaults[i] if i < len(defaults) else "")
inputs.append(input)
if st.button('Tell me the similarity.'):
results = {model: inference.text_similarity(anchor, inputs, model, MODELS_ID) for model in select_models}
df_results = {model: results[model] for model in results}
index = [f"{idx + 1}:{input[:min(15, len(input))]}..." for idx, input in enumerate(inputs)]
df_total = pd.DataFrame(index=index)
for key, value in df_results.items():
df_total[key] = [ts.item() for ts in torch.nn.functional.softmax(torch.from_numpy(value['score'].values))]
st.write('Here are the results for selected models:')
st.write(df_total)
st.write('Visualize the results of each model:')
st.line_chart(df_total)
elif menu == "Asymmetric QA":
st.header('Asymmetric QA')
st.markdown('''
**Instructions**: You can compare the Answer likeliness of a given Query with answer candidates of your choice. In the
background, we'll create an embedding for each answer, and then we'll use the cosine similarity function to calculate a
similarity metric between our query sentence and the others.
`mpnet_asymmetric_qa` model works best for hard-negative answers or distinguishing similar queries due to separate models
applied for encoding questions and answers.
For more cool information on sentence embeddings, see the [sBert project](https://www.sbert.net/examples/applications/computing-embeddings/README.html).
''')
select_models = st.multiselect("Choose models", options=list(QA_MODELS_ID), default=list(QA_MODELS_ID)[0])
anchor = st.text_input(
'Please enter here the query you want to compare with given answers:',
value="What is the weather in Paris?"
)
n_texts = st.number_input(
f'''How many answers you want to compare with: '{anchor}'?''',
value=3,
min_value=2)
inputs = []
defaults = ["It is raining in Paris right now with 70 F temperature.", "What is the weather in Berlin?", "I have 3 brothers."]
for i in range(int(n_texts)):
input = st.text_input(f'Answer {i + 1}:', value=defaults[i] if i < len(defaults) else "")
inputs.append(input)
if st.button('Tell me Answer likeliness.'):
results = {model: inference.text_similarity(anchor, inputs, model, QA_MODELS_ID) for model in select_models}
df_results = {model: results[model] for model in results}
index = [f"{idx + 1}:{input[:min(15, len(input))]}..." for idx, input in enumerate(inputs)]
df_total = pd.DataFrame(index=index)
for key, value in df_results.items():
df_total[key] = [ts.item() for ts in torch.nn.functional.softmax(torch.from_numpy(value['score'].values))]
st.write('Here are the results for selected models:')
st.write(df_total)
st.write('Visualize the results of each model:')
st.line_chart(df_total)
elif menu == "Search / Cluster":
st.header('Search / Cluster')
st.markdown('''
**Instructions**: Make a query for anything related to "Python" and the model will return you nearby answers via dot-product.
For more cool information on sentence embeddings, see the [sBert project](https://www.sbert.net/examples/applications/computing-embeddings/README.html).
''')
select_models = st.multiselect("Choose models", options=list(SEARCH_MODELS_ID), default=list(SEARCH_MODELS_ID)[0])
anchor = st.text_input(
'Please enter here your query about "Python", we will look for similar ones:',
value="How do I sort a dataframe by column"
)
n_texts = st.number_input(
f'''How many similar queries you want?''',
value=5,
min_value=2)
if st.button('Give me my search.'):
results = {model: inference.text_search(anchor, n_texts, model, QA_MODELS_ID) for model in select_models}
st.table(pd.DataFrame(results[select_models[0]]).T)
if st.button('3D Clustering of 1000 search results using T-SNE on generated embeddings'):
st.write("Currently only works at local due to Spaces / plotly integration.")
st.write("Demonstration : https://gyazo.com/1ff0aa438ae533de3b3c63382af7fe80")
# fig = inference.text_cluster(anchor, 1000, select_models[0], QA_MODELS_ID)
# fig.show()
elif menu == "Gender Bias Evaluation":
st.header("Gender Bias Evaluation")
st.markdown('''
**Instructions**: Here we can observe **inherent gender bias** in training set via random sampling of the sentences.
Input 3 texts, one without any mention of gender for target occupation and 2 others with gendered pronouns.
Hopefully the evaluation performed here can proceed towards improving Gender-neutrality of datasets.
For more cool information on sentence embeddings, see the [sBert project](https://www.sbert.net/examples/applications/computing-embeddings/README.html).
''')
select_models = st.multiselect("Choose models", options=list(MODELS_ID), default=list(MODELS_ID))
samples = st.radio("Samples", options=["President of United States", "Professor", "Nurse", "Custom"])
if samples == "President of United States":
base_text = st.text_input("Gender Neutral Text", "President of the United States promised relief to Hurricane survivors.")
male_text = st.text_input("Male-assumed Text", "He promised relief to Hurricane survivors.")
female_text = st.text_input("Female-assumed Text", "She promised relief to Hurricane survivors.")
elif samples == "Professor":
base_text = st.text_input("Gender Neutral Text", "Professor ended the class earlier than usual.")
male_text = st.text_input("Male-assumed Text", "He ended the class earlier than usual.")
female_text = st.text_input("Female-assumed Text", "She ended the class earlier than usual.")
elif samples == "Nurse":
base_text = st.text_input("Gender Neutral Text", "Nurse administered the vaccine and rubbed alcohol.")
male_text = st.text_input("Male-assumed Text", "He administered the vaccine and rubbed alcohol.")
female_text = st.text_input("Female-assumed Text", "She administered the vaccine and rubbed alcohol.")
else:
base_text = st.text_input("Gender Neutral Text")
male_text = st.text_input("Male-assumed Text")
female_text = st.text_input("Female-assumed Text")
enter = st.button("Compare")
if enter:
results = {model: inference.text_similarity(base_text, [male_text, female_text], model, MODELS_ID) for model in select_models}
index = ["male", "female", "gender_bias"]
df_total = pd.DataFrame(index=index)
for key, value in results.items():
softmax = [round(ts.item(), 4) for ts in torch.nn.functional.softmax(torch.from_numpy(value['score'].values))]
if softmax[0] > softmax[1]:
gender = "male"
elif abs(softmax[0] - softmax[1]) < 1e-3:
gender = "neutral"
else:
gender = "female"
softmax.append(gender)
df_total[key] = softmax
st.write('Here are the results for selected models:')
st.write(df_total) |