Spaces:
Running
Running
File size: 6,344 Bytes
bc72f4b da9558f 63ebd95 462f7e2 da9558f ca53a6e da9558f bc72f4b da9558f bc72f4b da9558f bc72f4b ca53a6e bc72f4b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
import streamlit as st
from transformers import pipeline
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import re
import pkg_resources
from symspellpy import SymSpell, Verbosity
# Initialize SymSpell
sym_spell = SymSpell(max_dictionary_edit_distance=2, prefix_length=7)
# Load a dictionary
dictionary_path = pkg_resources.resource_filename(
"symspellpy", "frequency_dictionary_en_82_765.txt"
)
sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1)
sym_spell.load_dictionary("./custom_dictionary.txt", term_index=0, count_index=1)
qa_pipeline = pipeline("text2text-generation", model="google/flan-t5-large")
documents = [
"""
Biodata or about ginni as name : GINNI GARG, email : [email protected], phone : +91-8295954475, Date of Birth - 1st January 1998.
""",
"""
Ginni completed his Graduation B.Tech in Computer Engineering from National Institute of Technology, Kurukshetra in between 2016 -2020 with cgpa 9.65
""",
"""
Father name of ginni is DharamPal Garg. He is Director JSS Sirsa. Mother Name is Rajni Garg, She is Housewife. Wife name of ginni is Ekta, She is Bank Manager. His Sister name is Arti Garg, She is Computer Engineer by Profession.
""",
"""
Ginni hobbies are reading books, Badminton, Yoga, Running, Walking, Exercies, GYM etc.
""",
"""
Ginni Favourite Books are Atomic Habits, IKigai, Biography of Swami Viveknand, Jeevan Amrit by OSHO etc.
""",
"""
Ginni Domain expertise is Software Engineering, specifically Backed Engineering.
""",
"""
ginni completed Schooling both 10th (2012-2013) with cgpa 10, and 12th (2014-2015) with 91% from D.A.V. Public School, Kalanwali.
""",
"""
all companies where ginni worked/experience as follow CDOT, SirionLabs, Otipy and Arcesium.
""",
"""
Gate qualified in 2020 with All India Rank 2562, gate score 562 and GATE Marks as 46.67/100. JEE Main Qualified in 2016 with All India Rank 8123, JEE Marks as 231/360 and JEE Percentile 99.3%
""",
"""
social media of ginni as follow - 'linkedin : www.linkedin.com/in/ginni-garg', 'github : https://github.com/GinniIndia'
""",
"""
All academic Achievements of ginni:
1.Received Award of Academic Excellence for Department Topper in First year.
2.Received Award of Academic Excellence for Securing Third Rank among all Departments in First Year.
3. Member of Institution Innovation Council under the ageis of MHRD’s Innovation Cell established at NIT, Kurukshetra for academic year 2018-2019.
4. Department Rank 4 (Computer Engineering Graduation) and University Rank 5.
5. In National Level Science Talent Search Examination and secured 252 rank at National Level.
""",
"""
List of all Publications or research papers of ginni as :
1. Ginni Garg and Ritu Garg. “Brain Tumor Detection and Classification using Hybrid Ensemble Classifier”.
International Journal of Healthcare Information Systems and Informatics (IJHISI), IGI Global, Clarivate Analytics
indexed, scopus indexed.
arxiv Link: https://arxiv.org/abs/2101.00216
2. Ginni Garg and Mantosh Biswas. “Improved Neural Network Based Plant Disease Identification” in First
International Conference on Advanced Communication & Computational Technology (ICACCT) 2019, Scopus
Index, LNEE Format.
arxiv Link: https://arxiv.org/abs/2101.00215
3. Ginni Garg and Ritu Garg. “A Hybrid MLP-SVM based classification using spatial-spectral features on Hyper-
spectral Images”. International Conference Futuristic Trends in Networks and Computing Technologies, FTNCT-
2020 Approved by CCIS, Springer (Indexed by Scopus and DBLP), Southern Federal University, Russia.
arxiv Link: https://arxiv.org/abs/2101.00214
"""
]
@st.cache_resource
def load_data():
model = SentenceTransformer("all-MiniLM-L6-v2")
doc_embeddings = model.encode(documents)
hnsw_index = faiss.IndexHNSWFlat(doc_embeddings.shape[1], 32)
hnsw_index.add(np.array(doc_embeddings).astype("float32"))
return model, hnsw_index
model, hnsw_index = load_data()
def preprocess_text(text):
# Use regex to split text into tokens, preserving numeric/alphanumeric data
tokens = re.findall(r'\w+|\d+\w*|\S+', text)
return tokens
# Function to correct spelling while preserving numeric data
def correct_spelling(text):
# Split text into tokens
tokens = preprocess_text(text)
corrected_tokens = []
# print(tokens)
for token in tokens:
# If the token is numeric or alphanumeric, preserve it
if token.isdigit() or re.match(r'\d+\w*', token) or re.match(r'[.,]', token):# or re.match(r'\S+', token):
corrected_tokens.append(token)
else:
# Otherwise, correct the token using SymSpell
suggestions = sym_spell.lookup(token, max_edit_distance=2, verbosity=1)
# print(suggestions)
if suggestions:
corrected_token = suggestions[0].term # Use the best suggestion
else:
corrected_token = token # If no suggestion, keep the original token
corrected_tokens.append(corrected_token)
# Join the corrected tokens into a sentence
return " ".join(corrected_tokens)
def rag_qa(question):
question = correct_spelling(question)
print(f'correct_question : {question}')
question_embedding = model.encode([question])
distances, retrieved_indices = hnsw_index.search(np.array(question_embedding).astype("float32"), k=1)
retrieved_doc = documents[retrieved_indices[0][0]]
prompt = f"Context: {retrieved_doc}\n\nQ: {question}\nA: If the answer is unclear from the context, respond with 'I don't know'"
response = qa_pipeline(prompt, max_length=500)
answer = response[0]['generated_text']
print(f'generated_answer : {answer}')
return answer
# Step 4: Streamlit UI Implementation
st.title("🧠 Ask anything about Ginni !")
question = st.text_input("Ask your question:")
if st.button("Get Answer"):
if question.strip():
answer = rag_qa(question)
st.success(f"**Answer:** {answer}")
else:
st.warning("Please enter a valid question.") |