{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "99d33a41", "metadata": {}, "outputs": [], "source": [ "from langchain_core.documents import Document\n" ] }, { "cell_type": "code", "execution_count": 6, "id": "d449b423", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Document(metadata={'source': 'web', 'pages': 1, 'data of created': '2026-10-30', 'Author': 'MaskMan'}, page_content='Hii here theh rag_demo working...........')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "doc = Document(\n", " page_content=\"Hii here theh rag_demo working...........\",\n", " metadata= {\n", " \"source\" : \"web\",\n", " \"pages\" : 1,\n", " \"data of created\" : \"2026-10-30\",\n", " \"Author\" : \"MaskMan\"\n", "\n", "\n", " }\n", ")\n", "doc" ] }, { "cell_type": "code", "execution_count": 7, "id": "75dbf292", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "os.makedirs(\"../data/text_files\", exist_ok=True)" ] }, { "cell_type": "code", "execution_count": 37, "id": "cc0a31f5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "txt created sucessfully......!\n" ] } ], "source": [ "sample_texts= {\n", " \"../data/text_files/python_intro.txt\" : \"\"\"Python is a high-level, interpreted, general-purpose programming language. Created by Guido van Rossum and first released in 1991, it is known for its emphasis on code readability and its use of significant indentation. The name \"Python\" was inspired by the British comedy series Monty Python's Flying Circus. \n", "Key Characteristics and Features:\n", "Readability: Python's syntax is designed to be clear and concise, often described as English-like, making it easier to learn and understand compared to many other languages.\n", "Interpreted: Python code does not need to be compiled before execution. An interpreter runs the code directly, allowing for rapid development and testing.\n", "Dynamically Typed: Variable types are determined at runtime, meaning you don't need to explicitly declare the type of a variable when you create it.\n", "High-Level Language: Python abstracts away many low-level details of computer hardware, allowing developers to focus on higher-level problem-solving.\n", "Multiple Programming Paradigms: It supports various programming styles, including object-oriented, imperative, and functional programming.\n", "Extensive Standard Library: Python comes with a large collection of modules and packages that provide pre-written code for a wide range of tasks, reducing the need to write everything from scratch.\n", "Cross-Platform Compatibility: Python can run on various operating systems, including Windows, macOS, and Linux, without requiring significant code changes.\n", "Free and Open-Source: Python is freely available for use and distribution, and its source code is open for modification and improvement by a global community.\n", "Common Applications:\n", "Python is widely used in diverse fields, including:\n", "Web Development: Frameworks like Django and Flask facilitate building web applications.\n", "Data Science and Machine Learning: Libraries such as NumPy, Pandas, Scikit-learn, TensorFlow, and PyTorch are essential for data analysis, visualization, and building machine learning models.\n", "Automation and Scripting: Its simplicity makes it ideal for automating repetitive tasks and system administration.\n", "Software Development: Used for building desktop applications and integrating with other systems.\n", "Artificial Intelligence: A popular choice for developing AI algorithms and applications.\"\"\"\n", "}\n", "\n", "\n", "\n", "for path, content in sample_texts.items():\n", " with open(path, 'w', encoding=\"utf-8\") as f:\n", " f.write(content)\n", "print(\"txt created sucessfully......!\")\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a935ee66", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "txt created sucessfully......!\n" ] } ], "source": [ "sample_texts= {\n", " \"../data/text_files/rag_introtxt\" : \"\"\"Retrieval-Augmented Generation (RAG) is an artificial intelligence (AI) framework that improves large language models (LLMs) by giving them access to up-to-date, external data sources. This process makes LLM-generated responses more accurate, context-specific, and reliable than those produced by the model's original, static training data alone. \n", "How RAG works\n", "A RAG system follows a series of steps to generate a response for a user query: \n", "Ingestion: A knowledge base, which can contain a variety of data types like PDFs, databases, and websites, is created. An embedding model converts this data into numerical representations, called vectors, and stores them in a vector database.\n", "Retrieval: When a user submits a query, the system uses a retriever model to search the vector database for the most relevant information.\n", "Augmentation: The retrieved, contextual information is added to the user's original query to create an enhanced prompt.\n", "Generation: The augmented prompt is sent to the large language model, which uses both its initial training data and the newly retrieved information to create a final, well-grounded response. \n", "Key benefits of using RAG\n", "Reduces \"hallucinations\": Since RAG grounds responses in factual, external data, it significantly lowers the risk of the LLM presenting false or nonsensical information.\n", "Provides up-to-date information: RAG systems can be updated continuously with fresh information without the need for expensive and time-consuming model retraining.\n", "Adds domain-specific knowledge: Companies can connect LLMs to their own internal documents, product manuals, or policies, enabling them to produce more specialized and relevant answers.\n", "Builds user trust: RAG allows the model to cite its sources, giving users the ability to verify the information for themselves.\n", "Increases cost efficiency: Organizations can improve the performance of a foundational model for specific tasks by using RAG, which is far less expensive than fine-tuning or retraining the entire LLM. \n", "Common RAG applications\n", "Customer service chatbots: A chatbot can provide specific, up-to-date information by referencing a company's internal knowledge base and product manuals.\n", "Research assistants: RAG can help financial analysts, medical professionals, and other researchers quickly access and synthesize information from vast databases of records, journals, and reports.\n", "Internal knowledge management: Employees can query an organization's documents using conversational language to find actionable insights, streamline onboarding, or get HR support.\n", "Content generation: The technology can be used to gather information from multiple authoritative sources and generate more reliable articles or summaries. \n", "\"\"\"\n", "}\n", "\n", "\n", "\n", "for path, content in sample_texts.items():\n", " with open(path, 'w', encoding=\"utf-8\") as f:\n", " f.write(content)\n", "print(\"txt created sucessfully......!\")\n", "\n" ] }, { "cell_type": "code", "execution_count": 8, "id": "134c3411", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Document(metadata={'source': '../data/text_files/python_intro.txt'}, page_content='Python is a high-level, interpreted, general-purpose programming language. Created by Guido van Rossum and first released in 1991, it is known for its emphasis on code readability and its use of significant indentation. The name \"Python\" was inspired by the British comedy series Monty Python\\'s Flying Circus. \\nKey Characteristics and Features:\\nReadability: Python\\'s syntax is designed to be clear and concise, often described as English-like, making it easier to learn and understand compared to many other languages.\\nInterpreted: Python code does not need to be compiled before execution. An interpreter runs the code directly, allowing for rapid development and testing.\\nDynamically Typed: Variable types are determined at runtime, meaning you don\\'t need to explicitly declare the type of a variable when you create it.\\nHigh-Level Language: Python abstracts away many low-level details of computer hardware, allowing developers to focus on higher-level problem-solving.\\nMultiple Programming Paradigms: It supports various programming styles, including object-oriented, imperative, and functional programming.\\nExtensive Standard Library: Python comes with a large collection of modules and packages that provide pre-written code for a wide range of tasks, reducing the need to write everything from scratch.\\nCross-Platform Compatibility: Python can run on various operating systems, including Windows, macOS, and Linux, without requiring significant code changes.\\nFree and Open-Source: Python is freely available for use and distribution, and its source code is open for modification and improvement by a global community.\\nCommon Applications:\\nPython is widely used in diverse fields, including:\\nWeb Development: Frameworks like Django and Flask facilitate building web applications.\\nData Science and Machine Learning: Libraries such as NumPy, Pandas, Scikit-learn, TensorFlow, and PyTorch are essential for data analysis, visualization, and building machine learning models.\\nAutomation and Scripting: Its simplicity makes it ideal for automating repetitive tasks and system administration.\\nSoftware Development: Used for building desktop applications and integrating with other systems.\\nArtificial Intelligence: A popular choice for developing AI algorithms and applications.')]\n" ] } ], "source": [ "from langchain_community.document_loaders import TextLoader\n", "\n", "loader = TextLoader(\"../data/text_files/python_intro.txt\", encoding=\"utf-8\")\n", "\n", "document = loader.load()\n", "print(document)\n" ] }, { "cell_type": "code", "execution_count": 9, "id": "83960869", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Document(metadata={'source': '../data/text_files/python_intro.txt'}, page_content='Python is a high-level, interpreted, general-purpose programming language. Created by Guido van Rossum and first released in 1991, it is known for its emphasis on code readability and its use of significant indentation. The name \"Python\" was inspired by the British comedy series Monty Python\\'s Flying Circus. \\nKey Characteristics and Features:\\nReadability: Python\\'s syntax is designed to be clear and concise, often described as English-like, making it easier to learn and understand compared to many other languages.\\nInterpreted: Python code does not need to be compiled before execution. An interpreter runs the code directly, allowing for rapid development and testing.\\nDynamically Typed: Variable types are determined at runtime, meaning you don\\'t need to explicitly declare the type of a variable when you create it.\\nHigh-Level Language: Python abstracts away many low-level details of computer hardware, allowing developers to focus on higher-level problem-solving.\\nMultiple Programming Paradigms: It supports various programming styles, including object-oriented, imperative, and functional programming.\\nExtensive Standard Library: Python comes with a large collection of modules and packages that provide pre-written code for a wide range of tasks, reducing the need to write everything from scratch.\\nCross-Platform Compatibility: Python can run on various operating systems, including Windows, macOS, and Linux, without requiring significant code changes.\\nFree and Open-Source: Python is freely available for use and distribution, and its source code is open for modification and improvement by a global community.\\nCommon Applications:\\nPython is widely used in diverse fields, including:\\nWeb Development: Frameworks like Django and Flask facilitate building web applications.\\nData Science and Machine Learning: Libraries such as NumPy, Pandas, Scikit-learn, TensorFlow, and PyTorch are essential for data analysis, visualization, and building machine learning models.\\nAutomation and Scripting: Its simplicity makes it ideal for automating repetitive tasks and system administration.\\nSoftware Development: Used for building desktop applications and integrating with other systems.\\nArtificial Intelligence: A popular choice for developing AI algorithms and applications.'),\n", " Document(metadata={'source': '../data/text_files/rag_intro.txt'}, page_content='Retrieval-Augmented Generation (RAG) is an artificial intelligence (AI) framework that improves large language models (LLMs) by giving them access to up-to-date, external data sources. This process makes LLM-generated responses more accurate, context-specific, and reliable than those produced by the model\\'s original, static training data alone. \\nHow RAG works\\nA RAG system follows a series of steps to generate a response for a user query: \\nIngestion: A knowledge base, which can contain a variety of data types like PDFs, databases, and websites, is created. An embedding model converts this data into numerical representations, called vectors, and stores them in a vector database.\\nRetrieval: When a user submits a query, the system uses a retriever model to search the vector database for the most relevant information.\\nAugmentation: The retrieved, contextual information is added to the user\\'s original query to create an enhanced prompt.\\nGeneration: The augmented prompt is sent to the large language model, which uses both its initial training data and the newly retrieved information to create a final, well-grounded response. \\nKey benefits of using RAG\\nReduces \"hallucinations\": Since RAG grounds responses in factual, external data, it significantly lowers the risk of the LLM presenting false or nonsensical information.\\nProvides up-to-date information: RAG systems can be updated continuously with fresh information without the need for expensive and time-consuming model retraining.\\nAdds domain-specific knowledge: Companies can connect LLMs to their own internal documents, product manuals, or policies, enabling them to produce more specialized and relevant answers.\\nBuilds user trust: RAG allows the model to cite its sources, giving users the ability to verify the information for themselves.\\nIncreases cost efficiency: Organizations can improve the performance of a foundational model for specific tasks by using RAG, which is far less expensive than fine-tuning or retraining the entire LLM. \\nCommon RAG applications\\nCustomer service chatbots: A chatbot can provide specific, up-to-date information by referencing a company\\'s internal knowledge base and product manuals.\\nResearch assistants: RAG can help financial analysts, medical professionals, and other researchers quickly access and synthesize information from vast databases of records, journals, and reports.\\nInternal knowledge management: Employees can query an organization\\'s documents using conversational language to find actionable insights, streamline onboarding, or get HR support.\\nContent generation: The technology can be used to gather information from multiple authoritative sources and generate more reliable articles or summaries. \\n')]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from langchain_community.document_loaders import DirectoryLoader\n", "\n", "dir_loader = DirectoryLoader(\n", " \"../data/text_files\",\n", " glob=\"**/*.txt\",\n", " loader_cls=TextLoader,\n", " show_progress=False\n", "\n", ")\n", "documents = dir_loader.load()\n", "documents\n" ] }, { "cell_type": "code", "execution_count": 10, "id": "0c387cde", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Document(metadata={'producer': 'Acrobat Distiller 7.0.5 (Windows)', 'creator': 'PScript5.dll Version 5.2', 'creationdate': '2011-02-12T10:45:11+00:00', 'source': '../data/pdf/c_programming_prefer.pdf', 'file_path': '../data/pdf/c_programming_prefer.pdf', 'total_pages': 37, 'format': 'PDF 1.6', 'title': 'Microsoft Word - L1 C for embedded systems.docx', 'author': 'Branislav', 'subject': '', 'keywords': '', 'moddate': '2018-08-08T10:43:48+05:30', 'trapped': '', 'modDate': \"D:20180808104348+05'30'\", 'creationDate': 'D:20110212104511Z', 'page': 0}, page_content='University of Portsmouth, Faculty of Technology, Department of Electronic and Computer Engineering \\nB122L – Principles of Digital Systems \\n \\n1 \\nUniversity of Portsmouth \\nFaculty of Technology \\nDepartment of Electronic and Computer Engineering \\n \\n \\n \\nModule: \\n \\nPrinciples of DigitalSystems \\nModule Code: \\nB122L \\nModule Topic: \\nMicrocontroller Applications \\nLecturer: \\n \\nBranislav Vuksanovic \\n \\n \\nLecture Notes: \\n \\nBasics of C Programming \\nfor \\nEmbedded Systems \\n \\n \\n \\nThis document reviews some general rules of C programming and \\nintroduces certain specifics of C programming for 8051 series of \\nmicrocontrollers. Simple C programs are listed and discussed in details \\nto illustrate the main points. This should provide reader with sufficient \\nknowledge to develop and test other, more complicated C programs for \\nsmall scale embedded systems employing this microcontrollers. \\n \\n \\n \\n \\n \\n \\n \\n \\nContent \\nIntroduction to C Programming for Embedded Systems ............... 2 \\nTemplate for Embedded C Program ............................................. 3 \\nC Directives .................................................................................. 4 \\nExample 1 ..................................................................................... 5 \\nProgramming Time Delays ........................................................... 6 \\nIndefinite Loops ............................................................................ 6 \\nVariables in Embedded C ............................................................. 7 \\nExample 2 ..................................................................................... 8 \\nC Functions .................................................................................. 9 \\nExample 3 ..................................................................................... 9 \\nOther Loops in C ......................................................................... 10 \\nExample 4 ................................................................................... 10 \\nMaking Decisions in the Program ............................................... 11 \\n? Operator .................................................................................. 11 \\nExample 5 ................................................................................... 12 \\nShort Hand Notations ................................................................. 12 \\nLogical and Bit-wise Operations .................................................. 12 \\nArrays ......................................................................................... 13 \\nExample 6 ................................................................................... 13 \\nExample 7 ................................................................................... 14'),\n", " Document(metadata={'producer': 'Acrobat Distiller 7.0.5 (Windows)', 'creator': 'PScript5.dll Version 5.2', 'creationdate': '2011-02-12T10:45:11+00:00', 'source': '../data/pdf/c_programming_prefer.pdf', 'file_path': '../data/pdf/c_programming_prefer.pdf', 'total_pages': 37, 'format': 'PDF 1.6', 'title': 'Microsoft Word - L1 C for embedded systems.docx', 'author': 'Branislav', 'subject': '', 'keywords': '', 'moddate': '2018-08-08T10:43:48+05:30', 'trapped': '', 'modDate': \"D:20180808104348+05'30'\", 'creationDate': 'D:20110212104511Z', 'page': 1}, page_content='University of Portsmouth, Faculty of Technology, Department of Electronic and Computer Engineering \\nB122L – Principles of Digital Systems \\n \\n2 \\n \\nIntroduction to C Programming for Embedded \\nSystems \\n \\n- \\nmost common programming languages for embedded systems are \\nC, BASIC and assembly languages \\n- \\nC used for embedded systems is slightly different compared to C \\nused for general purpose (under a PC platform) \\n- \\nprograms for embedded systems are usually expected to monitor \\nand control external devices and directly manipulate and use the \\ninternal architecture of the processor such as interrupt handling, \\ntimers, serial communications and other available features. \\n- \\nthere are many factors to consider when selecting languages for \\nembedded systems \\n\\x83 \\nEfficiency - Programs must be as short as possible \\nand memory must be used efficiently. \\n\\x83 \\nSpeed - Programs must run as fast as possible. \\n\\x83 \\nEase of implementation \\n\\x83 \\nMaintainability \\n\\x83 \\nReadability \\n \\n- \\nC compilers for embedded systems must provide ways to examine \\nand utilise various features of the microcontroller\\'s internal and \\nexternal architecture; this includes: \\n\\x83 \\nInterrupt Service Routines \\n\\x83 \\nReading from and writing to internal and external \\nmemories \\n\\x83 \\nBit manipulation \\n\\x83 \\nImplementation of timers / counters \\n\\x83 \\nExamination of internal registers \\n \\n- \\nmost embedded C compilers (as well as ordinary C compilers) have \\nbeen developed supporting the ANSI [American National Standard \\nfor Information] but compared to ordinary C they may differ in terms \\nof the outcome of some of the statements \\n- \\nstandard C compiler, communicates with the hardware components \\nvia the operating system of the machine but the C compiler for the \\nembedded system must communicate directly with the processor \\nand its components \\n \\n- \\nFor example consider this statement: \\n \\nprintf(\" C - Programming for 8051\\\\n\"); \\n \\nIn standard C running on a PC platform, the statement causes the \\nstring inside the quotation to be displayed on the screen. The same \\nstatement in an embedded system causes the string to be transmitted \\nvia the serial port pin (i.e TXD) of the microcontroller provided the serial \\nport has been initialized and enabled. \\n \\n- \\nAnother example: \\n \\nc=getch(); \\n \\nIn standard C running on a PC platform this causes a character to be \\nread from the keyboard on a PC. In an embedded system the \\ninstruction causes a character to be read from the serial pin (i.e. RXD) \\nof the microcontroller.'),\n", " Document(metadata={'producer': 'Acrobat Distiller 7.0.5 (Windows)', 'creator': 'PScript5.dll Version 5.2', 'creationdate': '2011-02-12T10:45:11+00:00', 'source': '../data/pdf/c_programming_prefer.pdf', 'file_path': '../data/pdf/c_programming_prefer.pdf', 'total_pages': 37, 'format': 'PDF 1.6', 'title': 'Microsoft Word - L1 C for embedded systems.docx', 'author': 'Branislav', 'subject': '', 'keywords': '', 'moddate': '2018-08-08T10:43:48+05:30', 'trapped': '', 'modDate': \"D:20180808104348+05'30'\", 'creationDate': 'D:20110212104511Z', 'page': 2}, page_content='University of Portsmouth, Faculty of Technology, Department of Electronic and Computer Engineering \\nB122L – Principles of Digital Systems \\n \\n3 \\nTemplate for Embedded C Program \\n \\n#include \\nvoid main(void) \\n{ \\n \\n// body of the program goes here \\n \\n} \\n \\n- \\nthe first line of the template is the C directive “#include ” \\n- \\nthis tells the compiler that during compilation, it should look into this \\nfile for symbols not defined within the program \\n- \\n“reg66x.h” file simple defines the internal special function registers \\nand their addresses \\n- \\npart of “reg66x.h” file is shown below \\n \\n/*-------------------------------------------*/ \\n/* Include file for 8xC66x SFR Definitions */ \\n/* Copyright Raisonance SA, 1990-2000 */ \\n/*-------------------------------------------*/ \\n \\n \\n \\n/* BYTE Registers */ \\nat 0x80 sfr P0 ; \\nat 0x90 sfr P1 ; \\nat 0xA0 sfr P2 ; \\nat 0xB0 sfr P3 ; \\nat 0xD0 sfr PSW ; \\nat 0xE0 sfr ACC ; \\nat 0xF0 sfr B ; \\nat 0x81 sfr SP ; \\nat 0x82 sfr DPL ; \\nat 0x83 sfr DPH ; \\nat 0x87 sfr PCON ; \\nat 0x88 sfr TCON ; \\nat 0x89 sfr TMOD ; \\nat 0x8A sfr TL0 ; \\nat 0x8B sfr TL1 ; \\nat 0x8C sfr TH0 ; \\nat 0x8D sfr TH1 ; \\n… \\n \\nIn this file, the numerical addresses of different special function \\nregisters inside the processor have been defined using symbolic \\nnames, e.g. P0 the symbol used for port 0 of the processor is assigned \\nits corresponding numeric address 80 in hexadecimal. Note that in C \\nnumbers that are hexadecimal are represented by the 0x. \\n \\n- \\nthe next line in the template declares the beginning of the body of \\nthe main part of the program \\n- \\nthe main part of the program is treated as any other function in C \\nprogram \\n- \\nevery C program should have a main function \\n- \\nfunctions are like “procedures” and “subroutines” in other languages \\n- \\nC function may be written in one of the following formats: \\n\\x83 \\nit may require some parameters to work on \\n\\x83 \\nit may return a value that it evaluates or determines \\n\\x83 \\nit may neither require parameters nor return any value \\n- \\nif a function requires any parameters, they are placed inside the \\nbrackets following the name of the function \\n- \\nif a function should return a value, it is declared just before the name \\nof the function \\n- \\nwhen the word ‘void’ is used before the function name it indicates \\nthat the function does not return any value \\n- \\nwhen the word ‘void’ is used between the brackets it indicates that \\nthe function does not require any parameters \\n- \\nmain function declaration:'),\n", " Document(metadata={'producer': 'Acrobat Distiller 7.0.5 (Windows)', 'creator': 'PScript5.dll Version 5.2', 'creationdate': '2011-02-12T10:45:11+00:00', 'source': '../data/pdf/c_programming_prefer.pdf', 'file_path': '../data/pdf/c_programming_prefer.pdf', 'total_pages': 37, 'format': 'PDF 1.6', 'title': 'Microsoft Word - L1 C for embedded systems.docx', 'author': 'Branislav', 'subject': '', 'keywords': '', 'moddate': '2018-08-08T10:43:48+05:30', 'trapped': '', 'modDate': \"D:20180808104348+05'30'\", 'creationDate': 'D:20110212104511Z', 'page': 3}, page_content='University of Portsmouth, Faculty of Technology, Department of Electronic and Computer Engineering \\nB122L – Principles of Digital Systems \\n \\n4 \\n \\nvoid main(void) \\n \\ntherefore, indicates that the main function requires no parameters and \\nthat it does not return any value. \\n \\n- \\nwhat the function must perform will be placed within the curly \\nbrackets following function declaration (your C code) \\n \\n \\nC Directives \\n \\n- \\n#include is one of many C directives \\n- \\nit is used to insert the contents of another file into the source code of \\nthe current file, as previously explained \\n- \\nthere are two slightly different form of using #include directive: \\n \\n#include < filename > \\n \\nor \\n#include “ filename “ \\n \\n- \\nthe first form (with the angle brackets) will search for the include file \\nin certain locations known to the compiler and it is used to include \\nstandard system header files (such are stdlib.h and stdio.h in \\nstandard C) \\n- \\nthe second form (with the double quotes) will search for the file in \\nthe same directory as the source file and this is used for header files \\nspecific to the program and usually written by the programmer \\n- \\nall directives are preceded with a “#” symbol \\n- \\nanother useful directive is a #define directive \\n- \\n#define directive associates a symbolic name with some numerical \\nvalue or text. \\n- \\nwherever that symbolic name occurs after the directive the \\npreprocessor will replace it with the specified value or text \\n- \\nthe value of “ON” in the program can be defined as 0xFF throughout \\nthe program using: \\n \\n#define ON 0xFF \\n \\n- \\nthis approach may be used to define various numerical values in the \\nprogram using more readable and understandable symbols. \\n- \\nthe advantage of using symbols rather than the actual numerical \\nvalues is that, if you need to change the value, all you need to do is \\nto change the number that is assigned to the symbol in the define \\nstatement rather than changing it within the program which in some \\ncases may be a large program and therefore tedious to do'),\n", " Document(metadata={'producer': 'Acrobat Distiller 7.0.5 (Windows)', 'creator': 'PScript5.dll Version 5.2', 'creationdate': '2011-02-12T10:45:11+00:00', 'source': '../data/pdf/c_programming_prefer.pdf', 'file_path': '../data/pdf/c_programming_prefer.pdf', 'total_pages': 37, 'format': 'PDF 1.6', 'title': 'Microsoft Word - L1 C for embedded systems.docx', 'author': 'Branislav', 'subject': '', 'keywords': '', 'moddate': '2018-08-08T10:43:48+05:30', 'trapped': '', 'modDate': \"D:20180808104348+05'30'\", 'creationDate': 'D:20110212104511Z', 'page': 4}, page_content='University of Portsmouth, Faculty of Technology, Department of Electronic and Computer Engineering \\nB122L – Principles of Digital Systems \\n \\n5 \\nExample 1 \\n \\nProgram to turn the LEDs on port 1 ON (see figure). \\n \\n#include \\nvoid main(void) \\n{ \\nP1 = 0xFF; \\n} \\n \\nThe body of the main function consists of just one statement and that is \\nP1=0xFF. This tells the compiler to send the number 0xFF which is the \\nhexadecimal equivalent of 255 or in binary (11111111) to port 1 which \\nin turn causes the 8 LEDs in port 1 to turn ON. \\n \\nNote that just like the line in the main body of the program, every line of \\na C program must end with a semicolon (i.e. ;) except in some special \\noccasions (to be discussed later) \\n \\nAlternative version of this program using directive #define. \\n \\n#include \\n#define ON 0xFF \\nvoid main(void) \\n{ \\nP1 = ON; \\n} \\n \\nIn the above example the value of “ON” is defined as 0xFF using: \\n \\n#define ON 0xFF \\n \\nand the statement in the body of the program is then written as: \\n \\nP1 = ON; \\n \\n \\n \\nLEDs and Switches Interfaced to Port 1 and Port 0 of 8051 \\nMicrocontroller'),\n", " Document(metadata={'producer': 'Acrobat Distiller 7.0.5 (Windows)', 'creator': 'PScript5.dll Version 5.2', 'creationdate': '2011-02-12T10:45:11+00:00', 'source': '../data/pdf/c_programming_prefer.pdf', 'file_path': '../data/pdf/c_programming_prefer.pdf', 'total_pages': 37, 'format': 'PDF 1.6', 'title': 'Microsoft Word - L1 C for embedded systems.docx', 'author': 'Branislav', 'subject': '', 'keywords': '', 'moddate': '2018-08-08T10:43:48+05:30', 'trapped': '', 'modDate': \"D:20180808104348+05'30'\", 'creationDate': 'D:20110212104511Z', 'page': 5}, page_content='University of Portsmouth, Faculty of Technology, Department of Electronic and Computer Engineering \\nB122L – Principles of Digital Systems \\n \\n6 \\nProgramming Time Delays \\n \\n- \\nfor various reasons it might be necessary to include some sort of \\ntime delay routine in most of the embedded system programs \\n- \\nsophisticated and very accurate techniques using timers/counters in \\nthe processor exist to achieve this \\n- \\none simple approach (not involving timers) is to let the processor \\ncount for a while before it continues \\n- \\nthis can be achieved using a loop in the program where program \\ndoes not do anything useful except incrementing the loop counter: \\n \\nfor(j=0; j<=255; j=j+1) \\n{ \\n; \\n} \\n \\nOnce the loop counter reaches the value of 255 program will exit the \\nloop and continue execution with the first statement following the loop \\nsection \\n \\n- \\nfor a longer delays we can use a nested loop structure, i.e. loop \\nwithin the loop: \\n \\nfor(i=0; i<=255; i=i+1) \\n{ \\nfor(j=0; j<=255; j=j+1) \\n{ \\n; \\n} \\n} \\n \\nNote that with this structure the program counts to 255 x 255. \\nIndefinite Loops \\n \\n- \\nembedded system might be required to continuously execute a \\nsection of a program indefinitely \\n- \\nto achieve this indefinite loop (loop without any exit condition) can \\nbe used \\n- \\nthe statement that performs this is: \\n \\nfor(;;) \\n{ \\n \\n} \\n \\nPart of the program to be repeated indefinitely must then be placed in \\nbetween the curly brackets after the for(;;) statement.'),\n", " Document(metadata={'producer': 'Acrobat Distiller 7.0.5 (Windows)', 'creator': 'PScript5.dll Version 5.2', 'creationdate': '2011-02-12T10:45:11+00:00', 'source': '../data/pdf/c_programming_prefer.pdf', 'file_path': '../data/pdf/c_programming_prefer.pdf', 'total_pages': 37, 'format': 'PDF 1.6', 'title': 'Microsoft Word - L1 C for embedded systems.docx', 'author': 'Branislav', 'subject': '', 'keywords': '', 'moddate': '2018-08-08T10:43:48+05:30', 'trapped': '', 'modDate': \"D:20180808104348+05'30'\", 'creationDate': 'D:20110212104511Z', 'page': 6}, page_content='University of Portsmouth, Faculty of Technology, Department of Electronic and Computer Engineering \\nB122L – Principles of Digital Systems \\n \\n7 \\nVariables in Embedded C \\n \\n- \\nvariables in C program are expected to change within a program \\n- \\nvariables in C may consist of a single letter or a combination of a \\nnumber of letters and numbers \\n- \\nspaces and punctuation are not allowed as part of a variable name \\n- \\nC is a case sensitive language (therefore I and i are treated as two \\nseparate variables) \\n- \\nin a C program variables must be declared immediately after the \\ncurly bracket marking the beginning of a function \\n- \\nto declare a variable, its type must be defined, it provides \\ninformation to the compiler of its storage requirement \\n- \\nsome of the more common types supported by C are listed below \\n \\n \\nType \\nSize \\nRange \\nunsigned char \\n1 byte \\n0 to 255 \\n(signed) char \\n1 byte \\n-128 - +127 \\nunsigned int \\n2 bytes \\n0 - 65535 \\n(signed) int \\n2 bytes \\n-32768 - +32767 \\nbit \\n1 bit \\n0 or 1 \\n(RAM bit-addressable part of \\nmemory only) \\nsbit \\n1 bit \\n0 or 1 \\n(SFR bit-addressable part of \\nmemory only) \\nsfr \\n8 bit \\nRAM addresses 80h-FFh only \\n \\n \\n \\n- \\ndefinition of type of a variable in a C program is an important factor \\nin the efficiency of a program \\n- \\ndepending on the type of a variable the compiler reserves memory \\nspaces for that variable \\n- \\nconsider the following guidelines when selecting the type of \\nvariables: \\n\\x83 \\nif speed is important and sign is not important, make \\nevery variable unsigned \\n\\x83 \\nunsigned char is the most common type to use in \\nprogramming 8051 microcontroller as most registers \\nin the processors are of size 8-bits (i.e one byte)'),\n", " Document(metadata={'producer': 'Acrobat Distiller 7.0.5 (Windows)', 'creator': 'PScript5.dll Version 5.2', 'creationdate': '2011-02-12T10:45:11+00:00', 'source': '../data/pdf/c_programming_prefer.pdf', 'file_path': '../data/pdf/c_programming_prefer.pdf', 'total_pages': 37, 'format': 'PDF 1.6', 'title': 'Microsoft Word - L1 C for embedded systems.docx', 'author': 'Branislav', 'subject': '', 'keywords': '', 'moddate': '2018-08-08T10:43:48+05:30', 'trapped': '', 'modDate': \"D:20180808104348+05'30'\", 'creationDate': 'D:20110212104511Z', 'page': 7}, page_content='University of Portsmouth, Faculty of Technology, Department of Electronic and Computer Engineering \\nB122L – Principles of Digital Systems \\n \\n8 \\nExample 2 \\n \\nProgram to indefinitely flash all LEDs on port 1 at a rate of 1 second. \\n \\n#include \\nvoid main(void) \\n{ \\nunsigned char i,j; \\nfor(;;) \\n{ \\nP1 = 0xFF; /* Turn All LEDs ON */ \\nfor(i=0; i<=255; i=i+1) \\n{ \\nfor(j=0; j<=255; j=j+1) \\n{ \\n; \\n} \\n} \\nP1 = 0x00; /* Turn All LEDs OFF */ \\nfor(i=0; i<=255; i=i+1) \\n{ \\nfor(j=0; j<=255; j=j+1) \\n{ \\n; \\n} \\n} \\n} \\n} \\n \\nNote that this program contains two identical time delay routines – first \\none keeps ON state on LEDs for app. 1 second and the second one \\nkeeps OFF state on LEDs for the same amount of time. Without those \\ntwo delays LEDs would switch ON and OFF so rapidly that the whole \\nset would constantly appear as not fully turned ON set of LEDs. \\nUsed delay routine is not very accurate so the delay should only \\napproximately be 1 s. \\n \\nNote that the first line immediately after the beginning of the main \\nfunction is used to declare two variables in the program, I and j: \\n \\nunsigned char i,j;'),\n", " Document(metadata={'producer': 'Acrobat Distiller 7.0.5 (Windows)', 'creator': 'PScript5.dll Version 5.2', 'creationdate': '2011-02-12T10:45:11+00:00', 'source': '../data/pdf/c_programming_prefer.pdf', 'file_path': '../data/pdf/c_programming_prefer.pdf', 'total_pages': 37, 'format': 'PDF 1.6', 'title': 'Microsoft Word - L1 C for embedded systems.docx', 'author': 'Branislav', 'subject': '', 'keywords': '', 'moddate': '2018-08-08T10:43:48+05:30', 'trapped': '', 'modDate': \"D:20180808104348+05'30'\", 'creationDate': 'D:20110212104511Z', 'page': 8}, page_content='University of Portsmouth, Faculty of Technology, Department of Electronic and Computer Engineering \\nB122L – Principles of Digital Systems \\n \\n9 \\nC Functions \\n \\n- \\nwhen a part of a program must be repeated more than once, a more \\nefficient way of implementing is to call the block to be repeated a \\nname and simple use the name when the block is needed. \\n- \\nthis leads to implementation of C function in the program \\n- \\nthe function (block) must be declared before the main program and \\ndefined, normally immediately after the main program is ended \\n- \\nrules for function declaration are same as for declaration of main \\nfunction \\n \\nvoid DELAY (void) \\n \\nAbove function declaration informs the compiler that there will be a \\nfunction called DELAY, requiring no parameters and returning no \\nparameters. \\nNote the absence of semicolon at the end of function declaration. \\n \\n- \\na function is defined in the same way as the main function (an open \\ncurly bracket marks the beginning of the function, variables used \\nwithin the function are then declared in the next line before the body \\nof the function is implemented, the function ends with a closed curly \\nbracket) \\n- \\nthe name of a function must follow the rules for the name of a \\nvariable - the function name may not have spaces, or any \\npunctuation \\n- \\na use of functions is advisable as functions make programs shorter \\nand readable, a shorter program also requires less space in the \\nmemory and therefore better efficiency \\n- \\nfunction is called in the main program using function name and \\nappropriate parameters (if any) \\n \\nExample 3 \\n \\nIn this example program from Example 2, to indefinitely flash all LEDs \\non port 1 at a rate of 1 second is rewritten using function to generate \\ntime delay. \\n \\n#include \\nvoid DELAY(void); \\nvoid main(void) \\n{ \\nfor(;;) \\n{ \\nP1 = 0xFF; \\n/* Turn All LEDs ON */ \\nDELAY(); \\n \\n/* Wait for 1 second */ \\nP1 = 0x00; \\n/* Turn All LEDs OFF */ \\nDELAY(); \\n} \\n} \\n \\nvoid DELAY(void) \\n{ \\nunsigned char i, j; \\nfor(i=0; i<=255; i=i+1) \\n{ \\nfor(j=0; j<=255; j=j+1) \\n{ \\n; \\n} \\n} \\n}'),\n", " Document(metadata={'producer': 'Acrobat Distiller 7.0.5 (Windows)', 'creator': 'PScript5.dll Version 5.2', 'creationdate': '2011-02-12T10:45:11+00:00', 'source': '../data/pdf/c_programming_prefer.pdf', 'file_path': '../data/pdf/c_programming_prefer.pdf', 'total_pages': 37, 'format': 'PDF 1.6', 'title': 'Microsoft Word - L1 C for embedded systems.docx', 'author': 'Branislav', 'subject': '', 'keywords': '', 'moddate': '2018-08-08T10:43:48+05:30', 'trapped': '', 'modDate': \"D:20180808104348+05'30'\", 'creationDate': 'D:20110212104511Z', 'page': 9}, page_content='University of Portsmouth, Faculty of Technology, Department of Electronic and Computer Engineering \\nB122L – Principles of Digital Systems \\n \\n10 \\nOther Loops in C \\n \\nTwo other loops exist in C language - and . \\nInstructions to generate single loop delay using those two loop \\ntechniques are given below. \\n \\n \\ni=0; \\ndo \\n{ \\ni=i+1; \\n} while(i<=255); \\n \\n \\n \\ni=0; \\nwhile(i<=255) \\n{ \\ni=i+1; \\n} \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\nExample 4 \\n \\nProgram from Examples 2 and 3 is rewritten and modified so that LEDs \\non port 1 flash with 1 s time delay only 10 times. \\n \\n#include \\nvoid DELAY(void); \\nvoid main(void) \\n{ \\nunsigned char N; \\nN=0; \\ndo \\n{ \\nP1 = 0xFF; /* Turn All LEDs ON */ \\nDELAY(); \\n/* Wait for 1 second */ \\nP1 = 0x00; /* Turn All LEDs OFF */ \\nDELAY(); \\nN=N+1; \\n} while(N<10); \\n} \\n \\nvoid DELAY(void) \\n{ \\nunsigned char i,j; \\nfor(i=0; i<=255; i=i+1) \\n{ \\nfor(j=0; j<=255; j=j+1) \\n{ \\n; \\n} \\n} \\n}'),\n", " Document(metadata={'producer': 'Acrobat Distiller 7.0.5 (Windows)', 'creator': 'PScript5.dll Version 5.2', 'creationdate': '2011-02-12T10:45:11+00:00', 'source': '../data/pdf/c_programming_prefer.pdf', 'file_path': '../data/pdf/c_programming_prefer.pdf', 'total_pages': 37, 'format': 'PDF 1.6', 'title': 'Microsoft Word - L1 C for embedded systems.docx', 'author': 'Branislav', 'subject': '', 'keywords': '', 'moddate': '2018-08-08T10:43:48+05:30', 'trapped': '', 'modDate': \"D:20180808104348+05'30'\", 'creationDate': 'D:20110212104511Z', 'page': 10}, page_content='University of Portsmouth, Faculty of Technology, Department of Electronic and Computer Engineering \\nB122L – Principles of Digital Systems \\n \\n11 \\nMaking Decisions in the Program \\n \\n- \\nan important feature of any embedded system is the ability to test \\nthe value of any parameter in the system and based on the outcome \\nof this test take an appropriate action \\n- \\nthe parameter tested must be a program variable and C construct \\nusually employed to perform test is the statement \\n- \\neffect of this statement is illustrated on the flowchart below \\n \\nCondition\\n?\\nAction B\\nAction A\\nTrue\\nFalse\\n \\n \\n- \\nC instructions to achieve this action are \\n \\nif(condition) \\n{ \\nPerform Action A \\n} \\nelse \\n{ \\nPerform Action B \\n} \\n \\n? Operator \\n \\n- \\nin the C program operator can be used as a short hand version \\nof the ‘if’ statement discussed above \\n- \\nflowchart and line of code provided below explain the effect of using \\nthis operator \\n \\nB>C\\n?\\nA=X\\nA=Y\\nNo\\nYes\\n \\n \\nA = (B > C) ? X : Y; \\n \\nIn other words “if B is greater than C then let A=X otherwise let A=Y”.'),\n", " Document(metadata={'producer': 'Acrobat Distiller 7.0.5 (Windows)', 'creator': 'PScript5.dll Version 5.2', 'creationdate': '2011-02-12T10:45:11+00:00', 'source': '../data/pdf/c_programming_prefer.pdf', 'file_path': '../data/pdf/c_programming_prefer.pdf', 'total_pages': 37, 'format': 'PDF 1.6', 'title': 'Microsoft Word - L1 C for embedded systems.docx', 'author': 'Branislav', 'subject': '', 'keywords': '', 'moddate': '2018-08-08T10:43:48+05:30', 'trapped': '', 'modDate': \"D:20180808104348+05'30'\", 'creationDate': 'D:20110212104511Z', 'page': 11}, page_content='University of Portsmouth, Faculty of Technology, Department of Electronic and Computer Engineering \\nB122L – Principles of Digital Systems \\n \\n12 \\nExample 5 \\n \\nProgram to operate the LEDs attached to port 1 as a BCD (Binary \\nCoded Decimal) up counter. A BCD up counter is one, which counts in \\nbinary from 0 to 9 and then repeats from 0 again. \\n \\n#include \\nvoid DELAY(void); \\nvoid main(void) \\n{ \\nunsigned char N=0; \\nfor(;;) \\n{ \\nP1 = N; \\n/* Turn All LEDs ON */ \\nDELAY(); \\n/* Wait for a while */ \\nN=N+1; \\nif(N>9) \\n{ \\nN=0; \\n} \\n} \\n} \\n \\nvoid DELAY(void) \\n{ \\nunsigned char i,j; \\nfor(i=0; i<=255; i=i+1) \\n{ \\nfor(j=0; j<=255; j=j+1) \\n{ \\n; \\n} \\n} \\n} \\n \\nShort Hand Notations \\n \\nStatement \\nShort hand \\nExplanation \\nA=A+1 \\nA++ \\nIncrement A \\nA=A-1 \\nA-- \\nDecrement A \\nA=A+B \\nA+=B \\nLet A=A+B \\nA=A-B \\nA-=B \\nLet A=A-B \\nA=A*B \\nA*=B \\nA=A*B \\nA=A/B \\nA/=B \\nA=A/B \\n \\n \\nLogical and Bit-wise Operations \\n \\nOperation \\nIn Assembly \\nIn C \\nExample in C \\nNOT \\nCPL A \\n~ \\nA=~A; \\nAND \\nANL A,#DATA \\n& \\nA= A & DATA \\nOR \\nORL A,#DATA \\n¦ \\nA= A ¦ DATA \\nEX-OR \\nXRL A,#DATA \\n^ \\nA= A ^ DATA \\nShift Right by n-bits \\nRRA \\n>> \\nA=A>>n \\nShift Left by n-bits \\nRLA \\n<< \\nA=A<" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class EmbeddingManager:\n", " def __init__(self, model_name : str = \"all-MiniLM-L6-v2\"):\n", " self.model_name = model_name \n", " self.model = None\n", " self._load_model()\n", "\n", "\n", " def _load_model(self):\n", " try:\n", " print(f\"Loading embedding model: {self.model_name}\")\n", " self.model = SentenceTransformer(self.model_name)\n", " print(f\"model loaded Sucessfully. Embedding dimension.{self.model.get_sentence_embedding_dimension()}\")\n", " except Exception as e: \n", " print(f\"Error loading model {self.model_name}: {e}\")\n", " raise\n", " def generate_embeddings(self, texts: List[str]) -> np.ndarray:\n", " if not self.model:\n", " raise ValueError(\"Model not loaded\")\n", " \n", " print(f\"Generating embeddings for {len(texts)} texts...\")\n", " embeddings = self.model.encode(texts, show_progress_bar=True)\n", " print(f\"Generated embeddings with shape: {embeddings.shape}\")\n", " return embeddings\n", " \n", "embedding_manager=EmbeddingManager()\n", "embedding_manager\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 28, "id": "5fe490be", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Vector store initialized. Collection: pdf_documents\n", "Existing documents in collection: 0\n" ] }, { "data": { "text/plain": [ "<__main__.VectorStore at 0x75281e50e410>" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "class VectorStore:\n", " \"\"\"Manages document embeddings in a ChromaDB vector store\"\"\"\n", " \n", " def __init__(self, collection_name: str = \"pdf_documents\", persist_directory: str = \"../data/vector_store\"):\n", " \"\"\"\n", " Initialize the vector store\n", " \n", " Args:\n", " collection_name: Name of the ChromaDB collection\n", " persist_directory: Directory to persist the vector store\n", " \"\"\"\n", " self.collection_name = collection_name\n", " self.persist_directory = persist_directory\n", " self.client = None\n", " self.collection = None\n", " self._initialize_store()\n", "\n", " def _initialize_store(self):\n", " \"\"\"Initialize ChromaDB client and collection\"\"\"\n", " try:\n", " # Create persistent ChromaDB client\n", " os.makedirs(self.persist_directory, exist_ok=True)\n", " self.client = chromadb.PersistentClient(path=self.persist_directory)\n", " \n", " # Get or create collection\n", " self.collection = self.client.get_or_create_collection(\n", " name=self.collection_name,\n", " metadata={\"description\": \"PDF document embeddings for RAG\"}\n", " )\n", " print(f\"Vector store initialized. Collection: {self.collection_name}\")\n", " print(f\"Existing documents in collection: {self.collection.count()}\")\n", " \n", " except Exception as e:\n", " print(f\"Error initializing vector store: {e}\")\n", " raise\n", "\n", " def add_documents(self, documents: List[Any], embeddings: np.ndarray):\n", " \"\"\"\n", " Add documents and their embeddings to the vector store\n", " \n", " Args:\n", " documents: List of LangChain documents\n", " embeddings: Corresponding embeddings for the documents\n", " \"\"\"\n", " if len(documents) != len(embeddings):\n", " raise ValueError(\"Number of documents must match number of embeddings\")\n", " \n", " print(f\"Adding {len(documents)} documents to vector store...\")\n", " \n", " # Prepare data for ChromaDB\n", " ids = []\n", " metadatas = []\n", " documents_text = []\n", " embeddings_list = []\n", " \n", " for i, (doc, embedding) in enumerate(zip(documents, embeddings)):\n", " # Generate unique ID\n", " doc_id = f\"doc_{uuid.uuid4().hex[:8]}_{i}\"\n", " ids.append(doc_id)\n", " \n", " # Prepare metadata\n", " metadata = dict(doc.metadata)\n", " metadata['doc_index'] = i\n", " metadata['content_length'] = len(doc.page_content)\n", " metadatas.append(metadata)\n", " \n", " # Document content\n", " documents_text.append(doc.page_content)\n", " \n", " # Embedding\n", " embeddings_list.append(embedding.tolist())\n", " \n", " # Add to collection\n", " try:\n", " self.collection.add(\n", " ids=ids,\n", " embeddings=embeddings_list,\n", " metadatas=metadatas,\n", " documents=documents_text\n", " )\n", " print(f\"Successfully added {len(documents)} documents to vector store\")\n", " print(f\"Total documents in collection: {self.collection.count()}\")\n", " \n", " except Exception as e:\n", " print(f\"Error adding documents to vector store: {e}\")\n", " raise\n", "\n", "vectorstore=VectorStore()\n", "vectorstore\n", " " ] }, { "cell_type": "code", "execution_count": null, "id": "2123e74e", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'chunks' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[29], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mchunks\u001b[49m\n", "\u001b[0;31mNameError\u001b[0m: name 'chunks' is not defined" ] } ], "source": [] } ], "metadata": { "kernelspec": { "display_name": "rag_demo", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }