{ "cells": [ { "cell_type": "markdown", "id": "d086c9ff-22b8-4e97-8572-808c48096136", "metadata": {}, "source": [ "# Part 2 - Data Creation for Free Doctor" ] }, { "cell_type": "markdown", "id": "4ad4b91a-2cdb-4361-b1a8-5f4e6cd1ce6d", "metadata": {}, "source": [ "In this section we are going to create the dataset, we are going to download the raw data and clean and create a data frame." ] }, { "cell_type": "markdown", "id": "a5ac32e1-c7bc-4897-a51e-5724c4b31425", "metadata": {}, "source": [ "First, let us download the online datasets to work" ] }, { "cell_type": "markdown", "id": "203aa753-7fb3-4598-ab99-576e4ac471ca", "metadata": {}, "source": [ "The MedDialog dataset (English) contains conversations (in English) between doctors and patients. It has 0.26 million dialogues. The data is continuously growing and more dialogues will be added. The raw dialogues are from healthcaremagic.com and icliniq.com. All copyrights of the data belong to healthcaremagic.com and icliniq.com." ] }, { "cell_type": "code", "execution_count": null, "id": "05371826-f8bc-45c5-88db-ebd87c7a84d4", "metadata": {}, "outputs": [], "source": [ "#!pip install pathlib" ] }, { "cell_type": "code", "execution_count": 6, "id": "8610028a-9fe1-4ec1-a1e7-5bb40533ac32", "metadata": {}, "outputs": [], "source": [ "import gdown" ] }, { "cell_type": "code", "execution_count": 7, "id": "f0bd7bd0-1974-43e9-baa3-e2e55cb9c21d", "metadata": {}, "outputs": [], "source": [ "url=\"https://drive.google.com/drive/folders/1-5mQW2gNj_kcBobllL9EpbJcUcT5aFpE?usp=sharing\"" ] }, { "cell_type": "code", "execution_count": 8, "id": "2e0b364b-eb38-4e45-ba4e-6ec21708c857", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['C:\\\\Users\\\\rusla\\\\Dropbox\\\\23-GITHUB\\\\Projects\\\\Free-Doctor-with-Artificial-Intelligence\\\\2-Data\\\\Medical-Dialogue-System\\\\dialogue_0.txt',\n", " 'C:\\\\Users\\\\rusla\\\\Dropbox\\\\23-GITHUB\\\\Projects\\\\Free-Doctor-with-Artificial-Intelligence\\\\2-Data\\\\Medical-Dialogue-System\\\\dialogue_1.txt',\n", " 'C:\\\\Users\\\\rusla\\\\Dropbox\\\\23-GITHUB\\\\Projects\\\\Free-Doctor-with-Artificial-Intelligence\\\\2-Data\\\\Medical-Dialogue-System\\\\dialogue_2.txt',\n", " 'C:\\\\Users\\\\rusla\\\\Dropbox\\\\23-GITHUB\\\\Projects\\\\Free-Doctor-with-Artificial-Intelligence\\\\2-Data\\\\Medical-Dialogue-System\\\\dialogue_3.txt',\n", " 'C:\\\\Users\\\\rusla\\\\Dropbox\\\\23-GITHUB\\\\Projects\\\\Free-Doctor-with-Artificial-Intelligence\\\\2-Data\\\\Medical-Dialogue-System\\\\dialogue_4.txt']" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gdown.download_folder(url, quiet=True, use_cookies=False)" ] }, { "cell_type": "markdown", "id": "bc9ef2a2-9398-470d-a85f-86df74f7ceaf", "metadata": {}, "source": [ "There are 5 raw dialogs that we are going to process to create the dataset to work." ] }, { "cell_type": "markdown", "id": "7dcea4e3-2b6c-4d92-97d6-9c0b4fad5388", "metadata": {}, "source": [ "We are going to create a Dataset with the following schema:\n", "\n", "- Description\t - String\n", "- Patient - String\t\n", "- Doctor - String\t\n", "\n", "The conversion of text to json.\n", "Then we will create the pandas dataframes" ] }, { "cell_type": "code", "execution_count": 57, "id": "baaef232-7a75-454c-bf55-b8d4bdbef1ec", "metadata": {}, "outputs": [], "source": [ "#importing modules\n", "import os\n", "from pathlib import Path\n", "import pandas as pd\n", "import json\n", "import re\n", "import json" ] }, { "cell_type": "code", "execution_count": 14, "id": "0d2678a8-dd10-4489-a0a6-684c5ddc2968", "metadata": {}, "outputs": [], "source": [ "from tqdm import tqdm\n", "from tools import timer\n", "t = timer.Timer()" ] }, { "cell_type": "code", "execution_count": 2, "id": "ee7b90d5-0372-4e73-96ee-ed53bc02f1bd", "metadata": {}, "outputs": [], "source": [ "def split_content(filename):\n", " '''\n", " filename: The filename must be txt format and stored in the \n", " ./2-Data/Medical-Dialogue-System/ folder\n", " res: The output is the list of all dialogues separated in each file.\n", " '''\n", " #to get the current working directory\n", " path = os.getcwd()\n", " file = os.path.join(path, \"Medical-Dialogue-System\", filename)\n", " subdirectory=filename.replace(\".txt\",\"\")\n", " #creating a new directory called data\n", " out_dir=os.path.join(path, \"data\",subdirectory)\n", " Path(out_dir).mkdir(parents=True, exist_ok=True)\n", " out_n = 0\n", " done = False\n", " try: \n", " with open(file, encoding=\"utf-8\") as in_file:\n", " while not done: #loop over output file names\n", " # Join various path components\n", " name=f\"out{out_n}.txt\"\n", " file_tmp=os.path.join(path, \"data\", subdirectory, name)\n", " #print(file_tmp)\n", " with open(file_tmp, \"w\", encoding=\"utf-8\") as out_file: #generate an output file name\n", " while not done: #loop over lines in the input file and write to the output file\n", " try:\n", " line = next(in_file).strip() #strip whitespace for consistency\n", " except StopIteration:\n", " done = True\n", " break\n", " if \"id=\" in line: #more robust than 'if line == \"SPLIT\\n\":'\n", " break\n", " else:\n", " out_file.write(line + '\\n') #must add back in newline because we stripped it out earlier \n", " out_n += 1 #increment output file name integer\n", " \n", " except Exception as error:\n", " print(\"An error occurred to open dialog:\", error) # An error occurred: name 'x' is not defined\n", " from os import walk\n", " # folder path\n", " dir_path = out_dir\n", " # List to store files name\n", " res = []\n", " for (dir_path, dir_names, file_names) in walk(dir_path):\n", " res.extend(file_names)\n", " #print(res)\n", " return res" ] }, { "cell_type": "code", "execution_count": 3, "id": "f13e8e5d-769c-4281-813d-1d6e62d6f9ed", "metadata": {}, "outputs": [], "source": [ "\n", "def findword(str, word):\n", " m = re.search(word, str)\n", " return m" ] }, { "cell_type": "code", "execution_count": 4, "id": "5dd0de51-8ea1-45e9-a004-0f823a86e9b2", "metadata": {}, "outputs": [], "source": [ "def create_dataframe(text_as_string,name_partial):\n", " string = re.sub('http://\\S+|https://\\S+', '', text_as_string)\n", " keywords = {'Description', 'Dialogue', 'Patient:', 'Doctor:'}\n", " text=re.split(r'\\n(?=Description|Dialogue|Patient|Doctor)' , string)\n", " updated_dic ={}\n", " for str in text: \n", " for word in keywords:\n", " #print(\"Looking for {}\".format(word))\n", " res = findword(str,word)\n", " if res is None:\n", " log=\"Word not found!!\"\n", " #print(log)\n", " else:\n", " #print(\"Search Success!!\")\n", " # Python program to convert text\n", " # file to JSON\n", " # The file to be converted to\n", " # json format\n", " lines = str\n", " # dictionary where the lines from\n", " # text will be stored\n", " parsed_dict = {}\n", " # reads each line and trims of extra the spaces\n", " # and gives only the valid words\n", " #print(\"Analyzing text:\",lines)\n", " try:\n", " command, content = lines.strip().split(None, 1) \t \t\n", " command=command.replace(\":\",\"\") \n", " content=content.strip()\n", " content=content.replace(\"\\n\", \" \")\n", " parsed_dict[command] = content\n", " updated_dic.update(parsed_dict)\n", " \n", " except:\n", " #print(\"No recurrence found\")\n", " pass\n", " #print(\"The output dataframe is:\")\n", " df = pd.DataFrame(updated_dic, index = [name_partial])\n", " return df" ] }, { "cell_type": "code", "execution_count": 5, "id": "7a95d607-d1e7-4e4e-91a6-ec6fc45d4be0", "metadata": {}, "outputs": [], "source": [ "def create(filename):\n", " '''\n", " filename: The filename must be txt format and stored in the \n", " ./2-Data/Medical-Dialogue-System/ folder\n", " df: The output is a dataframe\n", " '''\n", " #to get the current working directory\n", " path = os.getcwd()\n", " res=split_content(filename)\n", " # create an Empty DataFrame object\n", " df = pd.DataFrame()\n", " for partial in res:\n", " name_partial=partial\n", " subdirectory=filename.replace(\".txt\",\"\")\n", " file_partial=os.path.join(path, \"data\", subdirectory,name_partial)\n", " text_as_string = open(file_partial, encoding=\"utf-8\").read()\n", " #print(partial)\n", " df_partial=create_dataframe(text_as_string,name_partial)\n", " # A continuous index value will be maintained\n", " # across the rows in the new appended data frame.\n", " frames = [df, df_partial]\n", " df = pd.concat(frames)\n", " return df" ] }, { "cell_type": "code", "execution_count": 6, "id": "2beb7bea-abfa-4f4f-a4ff-bfe056d5c580", "metadata": {}, "outputs": [], "source": [ "def create_csv(filename):\n", " print(\"Creating dataframe ...\")\n", " dfa=create(filename)\n", " dfa=dfa.reset_index(names=\"Filename\")\n", " file_name=filename.replace(\".txt\",\".csv\")\n", " path = os.getcwd()\n", " out_dir=os.path.join(path, \"data\", \"csv\")\n", " out_file=os.path.join(out_dir,file_name)\n", " Path(out_dir).mkdir(parents=True, exist_ok=True)\n", " dfa.to_csv(out_file, sep='\\t', encoding='utf-8', index=False)\n", " df = pd.read_csv(out_file, sep = '\\t')\n", " print(\"File created: \",out_file)\n", " return df" ] }, { "cell_type": "code", "execution_count": 7, "id": "28194608-e120-46d3-88ac-69d7b00a22aa", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Creating dataframe ...\n", "File created: C:\\Users\\rusla\\Dropbox\\23-GITHUB\\Projects\\Free-Doctor-with-Artificial-Intelligence\\2-Data\\data\\csv\\test.csv\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FilenameDescriptionPatientDoctor
0out0.txtNaNNaNNaN
1out1.txtQ. What does abutment of the nerve root mean?Hi doctor,I am just wondering what is abutting...Hi. I have gone through your query with dilige...
2out2.txtQ. Every time I eat spicy food, I poop blood. ...Hi doctor, I am a 26 year old male. I am 5 fee...Hello. I have gone through your information an...
3out3.txtQ. Will Nano-Leo give permanent solution for e...Hello doctor, I am 48 years old. I am experien...Hi. For further doubts consult a sexologist on...
\n", "
" ], "text/plain": [ " Filename Description \\\n", "0 out0.txt NaN \n", "1 out1.txt Q. What does abutment of the nerve root mean? \n", "2 out2.txt Q. Every time I eat spicy food, I poop blood. ... \n", "3 out3.txt Q. Will Nano-Leo give permanent solution for e... \n", "\n", " Patient \\\n", "0 NaN \n", "1 Hi doctor,I am just wondering what is abutting... \n", "2 Hi doctor, I am a 26 year old male. I am 5 fee... \n", "3 Hello doctor, I am 48 years old. I am experien... \n", "\n", " Doctor \n", "0 NaN \n", "1 Hi. I have gone through your query with dilige... \n", "2 Hello. I have gone through your information an... \n", "3 Hi. For further doubts consult a sexologist on... " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "filename=\"test.txt\"\n", "#filename=\"dialogue_0.txt\"\n", "create_csv(filename)" ] }, { "cell_type": "markdown", "id": "8e46a13d-2128-439b-bcfa-57d2df2307b2", "metadata": {}, "source": [ "We select the list of documents to create dataframes" ] }, { "cell_type": "code", "execution_count": 17, "id": "9c39f514-ca47-4878-8e8d-a2c3e02f7b16", "metadata": {}, "outputs": [], "source": [ "filenames=[\"dialogue_0.txt\",\n", " \"dialogue_1.txt\",\n", " \"dialogue_2.txt\",\n", " \"dialogue_3.txt\",\n", " \"dialogue_4.txt\"]\n", "#filenames=[filename]" ] }, { "cell_type": "markdown", "id": "6ab9621e-e44d-4bab-a213-067db63fa55e", "metadata": {}, "source": [ "We perform the creation of dataframes" ] }, { "cell_type": "code", "execution_count": 18, "id": "a4d71845-4f53-47e2-8e66-c8a36165cf86", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " 0%| | 0/5 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FilenameDescriptionPatientDoctor
0out0.txtNaNNaNNaN
1out1.txtQ. What does abutment of the nerve root mean?Hi doctor,I am just wondering what is abutting...Hi. I have gone through your query with dilige...
2out10.txtQ. What should I do to reduce my weight gained...Hi doctor, I am a 22-year-old female who was d...Hi. You have really done well with the hypothy...
3out100.txtQ. I have started to get lots of acne on my fa...Hi doctor! I used to have clear skin but since...Hi there Acne has multifactorial etiology. Onl...
4out1000.txtQ. Can vitamin D3 deficiency cause inflammatio...Vitamin d3 deficiency (11 units).....consuming...NaN
\n", "" ], "text/plain": [ " Filename Description \\\n", "0 out0.txt NaN \n", "1 out1.txt Q. What does abutment of the nerve root mean? \n", "2 out10.txt Q. What should I do to reduce my weight gained... \n", "3 out100.txt Q. I have started to get lots of acne on my fa... \n", "4 out1000.txt Q. Can vitamin D3 deficiency cause inflammatio... \n", "\n", " Patient \\\n", "0 NaN \n", "1 Hi doctor,I am just wondering what is abutting... \n", "2 Hi doctor, I am a 22-year-old female who was d... \n", "3 Hi doctor! I used to have clear skin but since... \n", "4 Vitamin d3 deficiency (11 units).....consuming... \n", "\n", " Doctor \n", "0 NaN \n", "1 Hi. I have gone through your query with dilige... \n", "2 Hi. You have really done well with the hypothy... \n", "3 Hi there Acne has multifactorial etiology. Onl... \n", "4 NaN " ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 68, "id": "e6186dc0-d230-42ba-840c-107755034f85", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['iam having hairfall for a decade.. but fews weeks its getting worse.. recently taken blood test in which my iron and D3 are low... doctor has prescribed me with D3 60000iu once in a week and Livogen. i would like to know if biotin supplements are required to stop hair fall. if so pls recommned the brand names also.'],\n", " dtype=object)" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.tail(1)['Patient'].values" ] }, { "cell_type": "code", "execution_count": 69, "id": "ad558039-ceef-48d2-a356-bddca5a2d59b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([\"you did'nt mention about thyroid problem ...usually iron deficiency can cause hairloss ...also not mentioning about dandruff ...so keep your scalp clean ...avoid dandruff take iron tab ...takee mor iron rich foods like leafy vegetables..better reduce spicy and salty food ...take only soft food ..dont use hot water in hair...take less oil but maximum massage ...our oil neelibhringadi is good for growing hair ...do protein treatment also ...dont use hair colours ,regular use of shampoo avoid...thankyou\"],\n", " dtype=object)" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.tail(1)['Doctor'].values" ] }, { "cell_type": "markdown", "id": "fb1ee110-b679-44ec-bd62-88595501bfff", "metadata": {}, "source": [ "# Cleaning Dataframe\n" ] }, { "cell_type": "markdown", "id": "0b8e0852-c1f2-4bb8-b483-5ce61f662299", "metadata": {}, "source": [ "In this part we are going to separate the NaN values from the training dataset." ] }, { "cell_type": "code", "execution_count": 104, "id": "cb7a6d23-9806-4556-9311-1881302a8957", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 True\n", "1 False\n", "2 False\n", "3 False\n", "4 True\n", " ... \n", "257487 False\n", "257488 False\n", "257489 False\n", "257490 False\n", "257491 False\n", "Length: 257492, dtype: bool" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.isnull().any(axis=1)" ] }, { "cell_type": "code", "execution_count": 108, "id": "0fae7672-25e9-4bf9-a598-4682366f0687", "metadata": {}, "outputs": [], "source": [ "df2= df[df.isnull().any(axis=1)]" ] }, { "cell_type": "code", "execution_count": 110, "id": "5be4b110-4822-45f2-b743-9b4ce689851a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FilenameDescriptionPatientDoctor
0out0.txtNaNNaNNaN
4out1000.txtQ. Can vitamin D3 deficiency cause inflammatio...Vitamin d3 deficiency (11 units).....consuming...NaN
225out102.txtQ. Why has my father's swollen ankle turned da...My father, Male, 77 years old with swollen ank...NaN
1214out1109.txtQ. I have run out of Seroflo 250 inhaler that ...Hi, firstly i would like to thank for this won...NaN
1292out1116.txtQ. My mother has severe heart problem, and her...Age: 62 years My mother has severe heart probl...NaN
...............
255610out8304.txtSuggest ways to obtain a flawless skinNaNHello. Thank you for writing to usThis cream i...
255907out8572.txtIs Melas cream effective for acne scars?NaNHello and welcome to healthcaremagic.Melas cre...
255986out8643.txtNaNHi Doctor,I am taking Kaya's treatment for alm...Hi, Welcome to HCM. you should have followed y...
256061out8710.txtChicken pox scars on face, body. Taking Vitami...NaNhello and welcome to HCM forum dilusreni, I am...
256368out8988.txtSide effects of melacare creamNaNhi you have done mistake by applying it for to...
\n", "

576 rows × 4 columns

\n", "
" ], "text/plain": [ " Filename Description \\\n", "0 out0.txt NaN \n", "4 out1000.txt Q. Can vitamin D3 deficiency cause inflammatio... \n", "225 out102.txt Q. Why has my father's swollen ankle turned da... \n", "1214 out1109.txt Q. I have run out of Seroflo 250 inhaler that ... \n", "1292 out1116.txt Q. My mother has severe heart problem, and her... \n", "... ... ... \n", "255610 out8304.txt Suggest ways to obtain a flawless skin \n", "255907 out8572.txt Is Melas cream effective for acne scars? \n", "255986 out8643.txt NaN \n", "256061 out8710.txt Chicken pox scars on face, body. Taking Vitami... \n", "256368 out8988.txt Side effects of melacare cream \n", "\n", " Patient \\\n", "0 NaN \n", "4 Vitamin d3 deficiency (11 units).....consuming... \n", "225 My father, Male, 77 years old with swollen ank... \n", "1214 Hi, firstly i would like to thank for this won... \n", "1292 Age: 62 years My mother has severe heart probl... \n", "... ... \n", "255610 NaN \n", "255907 NaN \n", "255986 Hi Doctor,I am taking Kaya's treatment for alm... \n", "256061 NaN \n", "256368 NaN \n", "\n", " Doctor \n", "0 NaN \n", "4 NaN \n", "225 NaN \n", "1214 NaN \n", "1292 NaN \n", "... ... \n", "255610 Hello. Thank you for writing to usThis cream i... \n", "255907 Hello and welcome to healthcaremagic.Melas cre... \n", "255986 Hi, Welcome to HCM. you should have followed y... \n", "256061 hello and welcome to HCM forum dilusreni, I am... \n", "256368 hi you have done mistake by applying it for to... \n", "\n", "[576 rows x 4 columns]" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2" ] }, { "cell_type": "code", "execution_count": 111, "id": "a20364f9-1798-45da-99d5-84bf2254f9fa", "metadata": {}, "outputs": [], "source": [ "null_mask = df.isnull().any(axis=1)\n", "null_rows = df[null_mask]" ] }, { "cell_type": "code", "execution_count": 112, "id": "1ce5bf9a-dc1e-46be-9f7e-18357af33b43", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FilenameDescriptionPatientDoctor
0out0.txtNaNNaNNaN
4out1000.txtQ. Can vitamin D3 deficiency cause inflammatio...Vitamin d3 deficiency (11 units).....consuming...NaN
225out102.txtQ. Why has my father's swollen ankle turned da...My father, Male, 77 years old with swollen ank...NaN
1214out1109.txtQ. I have run out of Seroflo 250 inhaler that ...Hi, firstly i would like to thank for this won...NaN
1292out1116.txtQ. My mother has severe heart problem, and her...Age: 62 years My mother has severe heart probl...NaN
...............
255610out8304.txtSuggest ways to obtain a flawless skinNaNHello. Thank you for writing to usThis cream i...
255907out8572.txtIs Melas cream effective for acne scars?NaNHello and welcome to healthcaremagic.Melas cre...
255986out8643.txtNaNHi Doctor,I am taking Kaya's treatment for alm...Hi, Welcome to HCM. you should have followed y...
256061out8710.txtChicken pox scars on face, body. Taking Vitami...NaNhello and welcome to HCM forum dilusreni, I am...
256368out8988.txtSide effects of melacare creamNaNhi you have done mistake by applying it for to...
\n", "

576 rows × 4 columns

\n", "
" ], "text/plain": [ " Filename Description \\\n", "0 out0.txt NaN \n", "4 out1000.txt Q. Can vitamin D3 deficiency cause inflammatio... \n", "225 out102.txt Q. Why has my father's swollen ankle turned da... \n", "1214 out1109.txt Q. I have run out of Seroflo 250 inhaler that ... \n", "1292 out1116.txt Q. My mother has severe heart problem, and her... \n", "... ... ... \n", "255610 out8304.txt Suggest ways to obtain a flawless skin \n", "255907 out8572.txt Is Melas cream effective for acne scars? \n", "255986 out8643.txt NaN \n", "256061 out8710.txt Chicken pox scars on face, body. Taking Vitami... \n", "256368 out8988.txt Side effects of melacare cream \n", "\n", " Patient \\\n", "0 NaN \n", "4 Vitamin d3 deficiency (11 units).....consuming... \n", "225 My father, Male, 77 years old with swollen ank... \n", "1214 Hi, firstly i would like to thank for this won... \n", "1292 Age: 62 years My mother has severe heart probl... \n", "... ... \n", "255610 NaN \n", "255907 NaN \n", "255986 Hi Doctor,I am taking Kaya's treatment for alm... \n", "256061 NaN \n", "256368 NaN \n", "\n", " Doctor \n", "0 NaN \n", "4 NaN \n", "225 NaN \n", "1214 NaN \n", "1292 NaN \n", "... ... \n", "255610 Hello. Thank you for writing to usThis cream i... \n", "255907 Hello and welcome to healthcaremagic.Melas cre... \n", "255986 Hi, Welcome to HCM. you should have followed y... \n", "256061 hello and welcome to HCM forum dilusreni, I am... \n", "256368 hi you have done mistake by applying it for to... \n", "\n", "[576 rows x 4 columns]" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "null_rows" ] }, { "cell_type": "code", "execution_count": 113, "id": "daaddc10-c235-4cf7-a821-c972abd2970b", "metadata": {}, "outputs": [], "source": [ "not_null_mask = df.notnull().all(axis=1)\n", "not_null_rows = df[not_null_mask]" ] }, { "cell_type": "code", "execution_count": 114, "id": "6ed42229-728f-4954-8ffa-5cdfe02e417d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FilenameDescriptionPatientDoctor
1out1.txtQ. What does abutment of the nerve root mean?Hi doctor,I am just wondering what is abutting...Hi. I have gone through your query with dilige...
2out10.txtQ. What should I do to reduce my weight gained...Hi doctor, I am a 22-year-old female who was d...Hi. You have really done well with the hypothy...
3out100.txtQ. I have started to get lots of acne on my fa...Hi doctor! I used to have clear skin but since...Hi there Acne has multifactorial etiology. Onl...
5out10000.txtQ. Why do I have uncomfortable feeling between...Hello doctor,I am having an uncomfortable feel...Hello. The popping and discomfort what you fel...
6out10001.txtQ. My symptoms after intercourse threatns me e...Hello doctor,Before two years had sex with a c...Hello. The HIV test uses a finger prick blood ...
...............
257487out9995.txtWhy is hair fall increasing while using Bontre...I am suffering from excessive hairfall. My doc...Hello Dear Thanks for writing to us, we are he...
257488out9996.txtWhy was I asked to discontinue Androanagen whi...Hi Doctor, I have been having severe hair fall...hello, hair4u is combination of minoxid...
257489out9997.txtCan Mintop 5% Lotion be used by women for seve...Hi..i hav sever hair loss problem so consulted...HI I have evaluated your query thoroughly you...
257490out9998.txtIs Minoxin 5% lotion advisable instead of Foli...Hi, i am 25 year old girl, i am having massive...Hello and Welcome to ‘Ask A Doctor’ service.I ...
257491out9999.txtAre Biotin supplements need to reduce severe h...iam having hairfall for a decade.. but fews we...you did'nt mention about thyroid problem ...us...
\n", "

256916 rows × 4 columns

\n", "
" ], "text/plain": [ " Filename Description \\\n", "1 out1.txt Q. What does abutment of the nerve root mean? \n", "2 out10.txt Q. What should I do to reduce my weight gained... \n", "3 out100.txt Q. I have started to get lots of acne on my fa... \n", "5 out10000.txt Q. Why do I have uncomfortable feeling between... \n", "6 out10001.txt Q. My symptoms after intercourse threatns me e... \n", "... ... ... \n", "257487 out9995.txt Why is hair fall increasing while using Bontre... \n", "257488 out9996.txt Why was I asked to discontinue Androanagen whi... \n", "257489 out9997.txt Can Mintop 5% Lotion be used by women for seve... \n", "257490 out9998.txt Is Minoxin 5% lotion advisable instead of Foli... \n", "257491 out9999.txt Are Biotin supplements need to reduce severe h... \n", "\n", " Patient \\\n", "1 Hi doctor,I am just wondering what is abutting... \n", "2 Hi doctor, I am a 22-year-old female who was d... \n", "3 Hi doctor! I used to have clear skin but since... \n", "5 Hello doctor,I am having an uncomfortable feel... \n", "6 Hello doctor,Before two years had sex with a c... \n", "... ... \n", "257487 I am suffering from excessive hairfall. My doc... \n", "257488 Hi Doctor, I have been having severe hair fall... \n", "257489 Hi..i hav sever hair loss problem so consulted... \n", "257490 Hi, i am 25 year old girl, i am having massive... \n", "257491 iam having hairfall for a decade.. but fews we... \n", "\n", " Doctor \n", "1 Hi. I have gone through your query with dilige... \n", "2 Hi. You have really done well with the hypothy... \n", "3 Hi there Acne has multifactorial etiology. Onl... \n", "5 Hello. The popping and discomfort what you fel... \n", "6 Hello. The HIV test uses a finger prick blood ... \n", "... ... \n", "257487 Hello Dear Thanks for writing to us, we are he... \n", "257488 hello, hair4u is combination of minoxid... \n", "257489 HI I have evaluated your query thoroughly you... \n", "257490 Hello and Welcome to ‘Ask A Doctor’ service.I ... \n", "257491 you did'nt mention about thyroid problem ...us... \n", "\n", "[256916 rows x 4 columns]" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "not_null_rows" ] }, { "cell_type": "code", "execution_count": 115, "id": "e496afcf-c7af-4fb6-a77a-cec2d4c81078", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\rusla\\AppData\\Local\\Temp\\ipykernel_2460\\3964861292.py:1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " not_null_rows.drop('Filename', inplace=True, axis=1)\n" ] } ], "source": [ "not_null_rows.drop('Filename', inplace=True, axis=1)" ] }, { "cell_type": "code", "execution_count": 116, "id": "bf4e9921-c1f0-429f-89a2-4d59afa96134", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DescriptionPatientDoctor
1Q. What does abutment of the nerve root mean?Hi doctor,I am just wondering what is abutting...Hi. I have gone through your query with dilige...
2Q. What should I do to reduce my weight gained...Hi doctor, I am a 22-year-old female who was d...Hi. You have really done well with the hypothy...
3Q. I have started to get lots of acne on my fa...Hi doctor! I used to have clear skin but since...Hi there Acne has multifactorial etiology. Onl...
5Q. Why do I have uncomfortable feeling between...Hello doctor,I am having an uncomfortable feel...Hello. The popping and discomfort what you fel...
6Q. My symptoms after intercourse threatns me e...Hello doctor,Before two years had sex with a c...Hello. The HIV test uses a finger prick blood ...
............
257487Why is hair fall increasing while using Bontre...I am suffering from excessive hairfall. My doc...Hello Dear Thanks for writing to us, we are he...
257488Why was I asked to discontinue Androanagen whi...Hi Doctor, I have been having severe hair fall...hello, hair4u is combination of minoxid...
257489Can Mintop 5% Lotion be used by women for seve...Hi..i hav sever hair loss problem so consulted...HI I have evaluated your query thoroughly you...
257490Is Minoxin 5% lotion advisable instead of Foli...Hi, i am 25 year old girl, i am having massive...Hello and Welcome to ‘Ask A Doctor’ service.I ...
257491Are Biotin supplements need to reduce severe h...iam having hairfall for a decade.. but fews we...you did'nt mention about thyroid problem ...us...
\n", "

256916 rows × 3 columns

\n", "
" ], "text/plain": [ " Description \\\n", "1 Q. What does abutment of the nerve root mean? \n", "2 Q. What should I do to reduce my weight gained... \n", "3 Q. I have started to get lots of acne on my fa... \n", "5 Q. Why do I have uncomfortable feeling between... \n", "6 Q. My symptoms after intercourse threatns me e... \n", "... ... \n", "257487 Why is hair fall increasing while using Bontre... \n", "257488 Why was I asked to discontinue Androanagen whi... \n", "257489 Can Mintop 5% Lotion be used by women for seve... \n", "257490 Is Minoxin 5% lotion advisable instead of Foli... \n", "257491 Are Biotin supplements need to reduce severe h... \n", "\n", " Patient \\\n", "1 Hi doctor,I am just wondering what is abutting... \n", "2 Hi doctor, I am a 22-year-old female who was d... \n", "3 Hi doctor! I used to have clear skin but since... \n", "5 Hello doctor,I am having an uncomfortable feel... \n", "6 Hello doctor,Before two years had sex with a c... \n", "... ... \n", "257487 I am suffering from excessive hairfall. My doc... \n", "257488 Hi Doctor, I have been having severe hair fall... \n", "257489 Hi..i hav sever hair loss problem so consulted... \n", "257490 Hi, i am 25 year old girl, i am having massive... \n", "257491 iam having hairfall for a decade.. but fews we... \n", "\n", " Doctor \n", "1 Hi. I have gone through your query with dilige... \n", "2 Hi. You have really done well with the hypothy... \n", "3 Hi there Acne has multifactorial etiology. Onl... \n", "5 Hello. The popping and discomfort what you fel... \n", "6 Hello. The HIV test uses a finger prick blood ... \n", "... ... \n", "257487 Hello Dear Thanks for writing to us, we are he... \n", "257488 hello, hair4u is combination of minoxid... \n", "257489 HI I have evaluated your query thoroughly you... \n", "257490 Hello and Welcome to ‘Ask A Doctor’ service.I ... \n", "257491 you did'nt mention about thyroid problem ...us... \n", "\n", "[256916 rows x 3 columns]" ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "not_null_rows" ] }, { "cell_type": "markdown", "id": "4e889c22-15b1-4844-954b-6d4c87714c77", "metadata": {}, "source": [ "We save the not null data to go to the third step that is modeling" ] }, { "cell_type": "code", "execution_count": 117, "id": "7876de11-29c1-49ca-a999-8ba565db8da7", "metadata": {}, "outputs": [], "source": [ "not_null_rows.to_csv(\"dialogues.csv\", sep='\\t', encoding='utf-8', index=False)" ] } ], "metadata": { "kernelspec": { "display_name": "Python3 (GPT)", "language": "python", "name": "gpt" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" } }, "nbformat": 4, "nbformat_minor": 5 }