{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Load the dataset\n", "We will combine the Description and Patient text into a single combined text. The model will encode this combined text and it will output a single vector embedding." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "To run this notebook, you will need to install: pandas, openai, transformers, plotly, matplotlib, scikit-learn, torch (transformer dep), torchvision, and scipy." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# imports\n", "import pandas as pd\n", "import tiktoken\n", "from openai.embeddings_utils import get_embedding\n", "import time" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# embedding model parameters\n", "embedding_model = \"text-embedding-ada-002\"\n", "embedding_encoding = \"cl100k_base\" # this the encoding for text-embedding-ada-002\n", "max_tokens = 8000 # the maximum for text-embedding-ada-002 is 8191" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# load & inspect dataset\n", "df = pd.read_csv(\"../2-Data/dialogues.csv\", sep = '\\t')\n", "df = df.dropna()#.head(1000)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "df.rename(columns = {'Description':'Question',\"Doctor\":\"Answer\"}, inplace = True)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Question | \n", "Patient | \n", "Answer | \n", "
---|---|---|---|
0 | \n", "Q. What does abutment of the nerve root mean? | \n", "Hi doctor,I am just wondering what is abutting... | \n", "Hi. I have gone through your query with dilige... | \n", "
1 | \n", "Q. What should I do to reduce my weight gained... | \n", "Hi doctor, I am a 22-year-old female who was d... | \n", "Hi. You have really done well with the hypothy... | \n", "
2 | \n", "Q. I have started to get lots of acne on my fa... | \n", "Hi doctor! I used to have clear skin but since... | \n", "Hi there Acne has multifactorial etiology. Onl... | \n", "
3 | \n", "Q. Why do I have uncomfortable feeling between... | \n", "Hello doctor,I am having an uncomfortable feel... | \n", "Hello. The popping and discomfort what you fel... | \n", "
4 | \n", "Q. My symptoms after intercourse threatns me e... | \n", "Hello doctor,Before two years had sex with a c... | \n", "Hello. The HIV test uses a finger prick blood ... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
256911 | \n", "Why is hair fall increasing while using Bontre... | \n", "I am suffering from excessive hairfall. My doc... | \n", "Hello Dear Thanks for writing to us, we are he... | \n", "
256912 | \n", "Why was I asked to discontinue Androanagen whi... | \n", "Hi Doctor, I have been having severe hair fall... | \n", "hello, hair4u is combination of minoxid... | \n", "
256913 | \n", "Can Mintop 5% Lotion be used by women for seve... | \n", "Hi..i hav sever hair loss problem so consulted... | \n", "HI I have evaluated your query thoroughly you... | \n", "
256914 | \n", "Is Minoxin 5% lotion advisable instead of Foli... | \n", "Hi, i am 25 year old girl, i am having massive... | \n", "Hello and Welcome to ‘Ask A Doctor’ service.I ... | \n", "
256915 | \n", "Are Biotin supplements need to reduce severe h... | \n", "iam having hairfall for a decade.. but fews we... | \n", "you did'nt mention about thyroid problem ...us... | \n", "
256916 rows × 3 columns
\n", "\n", " | Question | \n", "Patient | \n", "Answer | \n", "combined | \n", "
---|---|---|---|---|
0 | \n", "Q. What does abutment of the nerve root mean? | \n", "Hi doctor,I am just wondering what is abutting... | \n", "Hi. I have gone through your query with dilige... | \n", "Question: Q. What does abutment of the nerve r... | \n", "
1 | \n", "Q. What should I do to reduce my weight gained... | \n", "Hi doctor, I am a 22-year-old female who was d... | \n", "Hi. You have really done well with the hypothy... | \n", "Question: Q. What should I do to reduce my wei... | \n", "
\n", " | Description | \n", "Patient | \n", "Doctor | \n", "combined | \n", "n_tokens | \n", "
---|---|---|---|---|---|
0 | \n", "Q. What does abutment of the nerve root mean? | \n", "Hi doctor,I am just wondering what is abutting... | \n", "Hi. I have gone through your query with dilige... | \n", "Description: Q. What does abutment of the nerv... | \n", "95 | \n", "
1 | \n", "Q. What should I do to reduce my weight gained... | \n", "Hi doctor, I am a 22-year-old female who was d... | \n", "Hi. You have really done well with the hypothy... | \n", "Description: Q. What should I do to reduce my ... | \n", "519 | \n", "
2 | \n", "Q. I have started to get lots of acne on my fa... | \n", "Hi doctor! I used to have clear skin but since... | \n", "Hi there Acne has multifactorial etiology. Onl... | \n", "Description: Q. I have started to get lots of ... | \n", "285 | \n", "
3 | \n", "Q. Why do I have uncomfortable feeling between... | \n", "Hello doctor,I am having an uncomfortable feel... | \n", "Hello. The popping and discomfort what you fel... | \n", "Description: Q. Why do I have uncomfortable fe... | \n", "324 | \n", "
4 | \n", "Q. My symptoms after intercourse threatns me e... | \n", "Hello doctor,Before two years had sex with a c... | \n", "Hello. The HIV test uses a finger prick blood ... | \n", "Description: Q. My symptoms after intercourse ... | \n", "442 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
256911 | \n", "Why is hair fall increasing while using Bontre... | \n", "I am suffering from excessive hairfall. My doc... | \n", "Hello Dear Thanks for writing to us, we are he... | \n", "Description: Why is hair fall increasing while... | \n", "211 | \n", "
256912 | \n", "Why was I asked to discontinue Androanagen whi... | \n", "Hi Doctor, I have been having severe hair fall... | \n", "hello, hair4u is combination of minoxid... | \n", "Description: Why was I asked to discontinue An... | \n", "154 | \n", "
256913 | \n", "Can Mintop 5% Lotion be used by women for seve... | \n", "Hi..i hav sever hair loss problem so consulted... | \n", "HI I have evaluated your query thoroughly you... | \n", "Description: Can Mintop 5% Lotion be used by w... | \n", "191 | \n", "
256914 | \n", "Is Minoxin 5% lotion advisable instead of Foli... | \n", "Hi, i am 25 year old girl, i am having massive... | \n", "Hello and Welcome to ‘Ask A Doctor’ service.I ... | \n", "Description: Is Minoxin 5% lotion advisable in... | \n", "232 | \n", "
256915 | \n", "Are Biotin supplements need to reduce severe h... | \n", "iam having hairfall for a decade.. but fews we... | \n", "you did'nt mention about thyroid problem ...us... | \n", "Description: Are Biotin supplements need to re... | \n", "213 | \n", "
256916 rows × 5 columns
\n", "\n", " | Question | \n", "Patient | \n", "Answer | \n", "combined | \n", "n_tokens | \n", "embedding | \n", "
---|---|---|---|---|---|---|
0 | \n", "Q. What does abutment of the nerve root mean? | \n", "Hi doctor,I am just wondering what is abutting... | \n", "Hi. I have gone through your query with dilige... | \n", "Question: Q. What does abutment of the nerve r... | \n", "95 | \n", "[-0.109211065, -0.17469415, 0.18996556, 0.0599... | \n", "
1 | \n", "Q. What should I do to reduce my weight gained... | \n", "Hi doctor, I am a 22-year-old female who was d... | \n", "Hi. You have really done well with the hypothy... | \n", "Question: Q. What should I do to reduce my wei... | \n", "519 | \n", "[-0.014065318, 0.0440334, 0.26095688, 0.086799... | \n", "
2 | \n", "Q. I have started to get lots of acne on my fa... | \n", "Hi doctor! I used to have clear skin but since... | \n", "Hi there Acne has multifactorial etiology. Onl... | \n", "Question: Q. I have started to get lots of acn... | \n", "285 | \n", "[-0.39175138, -0.025890486, -0.024644196, -0.0... | \n", "
3 | \n", "Q. Why do I have uncomfortable feeling between... | \n", "Hello doctor,I am having an uncomfortable feel... | \n", "Hello. The popping and discomfort what you fel... | \n", "Question: Q. Why do I have uncomfortable feeli... | \n", "324 | \n", "[-0.29406005, -0.31878802, 0.27588362, 0.09649... | \n", "
4 | \n", "Q. My symptoms after intercourse threatns me e... | \n", "Hello doctor,Before two years had sex with a c... | \n", "Hello. The HIV test uses a finger prick blood ... | \n", "Question: Q. My symptoms after intercourse thr... | \n", "442 | \n", "[-0.36187398, 0.18491694, -0.3090741, -0.30197... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
995 | \n", "Q. My lax les is 38 cm with inflamed gastric f... | \n", "Hello doctor, My lax les is 38 cm with inflame... | \n", "Hello. Gastritis is an inflammation of stomach... | \n", "Question: Q. My lax les is 38 cm with inflamed... | \n", "214 | \n", "[-0.1555396, -0.44157797, -0.15364785, 0.25760... | \n", "
996 | \n", "Q. I am suffering from mood swings. Kindly adv... | \n", "Hello doctor,I want to get some information re... | \n", "Hello. Let me answer your questions via some b... | \n", "Question: Q. I am suffering from mood swings. ... | \n", "491 | \n", "[-0.2296337, 0.119730674, 0.37153018, 0.062901... | \n", "
997 | \n", "Q. I am having swollen lymph node in my neck. ... | \n", "Hello doctor, I went to the chiropractor and g... | \n", "Hello. I do not think that because of chiropra... | \n", "Question: Q. I am having swollen lymph node in... | \n", "395 | \n", "[-0.10149522, -0.33532476, 0.40812746, -0.2713... | \n", "
998 | \n", "Q. How good is Albenza for a raccoon roundworm... | \n", "Hello doctor,I am concerned about a possible r... | \n", "Hello. Albendazole 400 mg single star dose is ... | \n", "Question: Q. How good is Albenza for a raccoon... | \n", "240 | \n", "[-0.06408733, 0.17669381, 0.09132431, -0.09456... | \n", "
999 | \n", "Q. Will Kalarchikai cure multiple ovarian cyst... | \n", "Hello doctor, I have multiple small cysts in b... | \n", "Hello. I just read your query. See Kalarachi K... | \n", "Question: Q. Will Kalarchikai cure multiple ov... | \n", "309 | \n", "[0.03657364, 0.24297515, 0.09555141, 0.0270566... | \n", "
1000 rows × 6 columns
\n", "