Spaces:

akhil-vaidya
/

Map-Data

Running

@@ -1,8 +1,13 @@
-title - Automated/High Fidelity Functional Map Generation using Text Data Clustering
 abstract
-keywords
 introduction
@@ -10,15 +15,103 @@ literature review
 methodology
-1. data sourcing
-2. data filtering
-3. square tiles generation
-4. example use cases - define regions
-5. data description - EDA
 6. embeddings - TFIDF, USE, Sentence Transformers (3 4 models)
 7. clustering - knee/elbow point
 8. 3D visual representation
 9. auto labelling and comparison with manual coded values
 results

+title - Automated High Fidelity Functional Map Generation using Text Data Clustering
 abstract
+This study presents a novel methodology for automating the generation of high-fidelity functional maps using text-based clustering of OpenStreetMap data. Functional maps, which visualize the distribution of residential, commercial, industrial, and natural zones within regions, traditionally require extensive manual effort through field surveys, data compilation, and cartographic work. While Geographic Information Systems (GIS) have streamlined this process, significant manual verification is still needed for accuracy.
+We propose an automated framework that processes OpenStreetMap text data through Natural Language Processing (NLP) techniques to classify regional land use. Our methodology divides target regions into 1km² tiles and analyzes the associated map text data (including building names, shop names, and point-of-interest descriptions) using the Universal Sentence Encoder (USE) for text embedding. These embeddings are then clustered using the K-means algorithm to identify distinct functional zones.
+To validate our approach, we applied this framework to a 2,500 km² area within the Mumbai Metropolitan Region. The region was first manually labeled to establish ground truth data, against which we compared our automated classifications. Our results demonstrate that this methodology can effectively generate functional maps while significantly reducing manual effort. The framework's scalability makes it particularly valuable for mapping large urban areas, achieving promising accuracy, precision, recall, and F1 scores in distinguishing between residential, commercial, industrial, and natural zones.
+Keywords: functional maps, text embedding, K-means clustering, OpenStreetMap, urban planning, automated mapping
 introduction
 methodology
+1. OpenStreetMap Data Collection
+This study utilizes OpenStreetMap (OSM) as its primary data source. OSM is a collaborative mapping platform that provides comprehensive geographic data through community contributions, functioning similarly to Wikipedia for geographic information. The platform offers detailed spatial data including:
+Building information and classifications
+Commercial establishments and points of interest
+Road networks and transportation infrastructure
+Land use designations
+Natural features and boundaries
+The data is freely accessible through the OSM API, which provides structured information in a standardized format. OSM's data quality is maintained through community verification processes, making it particularly reliable in densely populated urban areas where contributor activity is high.
+2. Spatial Grid Generation and Region Partitioning
+To systematically analyze large geographic areas, we developed a grid-based partitioning approach:
+Grid Definition: The target region is overlaid with a uniform grid system, where each cell represents a 1km × 1km area. This granularity was chosen to:
+Capture sufficient detail for meaningful functional analysis
+Maintain computational efficiency
+Data Association:
+OSM features falling within each tile's boundaries are extracted
+Spatial indices are created to optimize the feature-to-tile mapping process
+Each tile accumulates all relevant text data from its contained features
+This structured approach to data collection and spatial partitioning provides the foundation for subsequent text processing and clustering analyses. The uniform grid system ensures consistent spatial resolution across the study area while facilitating scalable processing of large geographic regions.
+3. Data Preprocessing and Text Analysis
+The raw data extracted from the OpenStreetMap API underwent a comprehensive filtering and preprocessing pipeline to ensure data quality and relevance. Initial filtering was performed using predefined OSM database filters to retain only pertinent geographic features, including buildings, commercial establishments, offices, transportation infrastructure, and recreational areas. This selective approach helped maintain focus on features that contribute meaningfully to functional zone classification while reducing noise in the dataset.
+Following the initial feature selection, text data from each 1km² tile was aggregated to create consolidated text chunks representing the geographic characteristics of each region. The consolidation process preserved essential geographic nomenclature while maintaining the spatial relationships between features within each tile. To ensure data quality and identify potential anomalies, we conducted extensive exploratory data analysis (EDA) using statistical methods. Box plots were employed to analyze the quartile distribution of text lengths across tiles, enabling the identification of outliers and unusual patterns in the data distribution.
+Regions lacking text data were subjected to additional verification processes. These void areas were systematically revalidated through cross-referencing with satellite imagery and existing land use data, leading to the identification of natural features such as mangroves and water bodies. This verification process helped distinguish between actual data gaps and legitimate natural areas, improving the overall accuracy of our classification framework.
+The text preprocessing pipeline incorporated the Natural Language Toolkit (NLTK) for comprehensive text cleaning and normalization. This process included the removal of stopwords, lemmatization of terms, and standardization of text format. The lemmatization process was particularly crucial as it reduced inflected words to their base form, ensuring consistent representation of similar features across different tiles. Additionally, we analyzed the binned distribution of text lengths to establish appropriate thresholds for outlier exclusion, ensuring that the final dataset maintained a balance between comprehensive coverage and data quality.
+Through this systematic approach to data filtering and preprocessing, we established a robust foundation for subsequent embedding and clustering analyses. The careful attention to data quality and feature relevance during this stage significantly contributed to the effectiveness of our functional zone classification methodology.
+4. Case Study Implementation
+To validate our methodological framework, we selected the Mumbai Metropolitan Region (MMR) as our primary study area. This region presents an ideal test case due to its diverse urban landscape, encompassing a rich mixture of land use patterns across a substantial geographic area of 2,500 square kilometers.
+Study Area Selection and Characteristics
+The MMR serves as an exemplary urban testing ground for our framework due to several key characteristics. The region features a complex tapestry of land use, including high-density commercial districts, extensive residential developments, established industrial zones, and significant natural features such as the Arabian Sea coastline, creeks, and mangrove forests. This diversity provides an optimal environment for testing our classification methodology across various functional zones.
+The study area was defined as a 50km × 50km square region, centered on the metropolitan core. This delineation was carefully chosen to capture the full spectrum of urban development patterns, from the dense urban core to peripheral areas experiencing rapid transformation. The selected region also includes various stages of urban development, from historical neighborhoods to emerging commercial corridors and industrial estates.
+Implementation Framework
+Following our established methodology, we partitioned the study area into 1km × 1km tiles, generating a dataset of 2,500 distinct spatial units. This resolution was selected to:
+Maintain sufficient granularity for meaningful functional analysis
+Capture local variations in land use patterns
+Enable efficient computational processing
+Facilitate practical validation of results
+During the exploratory data analysis phase, we identified and addressed several key considerations specific to the MMR context. Tiles containing no text data were subjected to additional verification, particularly in coastal areas and regions containing large natural features. This process helped distinguish between data gaps and legitimate natural areas, enhancing the accuracy of our classification system.
+The MMR case study provided an ideal opportunity to test our framework's ability to handle complex urban environments. The region's varied development patterns, mixed land uses, and distinct natural boundaries offered appropriate challenges for validating our automated classification methodology. The results from this implementation served as the foundation for our subsequent accuracy assessments and methodology validation.
+5. data description - EDA (elaboarate on the EDA part of the use case)
+The exploratory data analysis phase revealed crucial insights about the textual characteristics and spatial distribution patterns across the Mumbai Metropolitan Region study area. Our analysis focused on understanding the distribution of text data across tiles and identifying patterns that could influence the classification process.
+Text Length Distribution Analysis
+Initial analysis of text length distribution across the 2,500 tiles revealed significant variations in data density. A box plot analysis (Figure X) demonstrated that the majority of tiles (over 75%) contained between 50 and 1100 characters of preprocessed text, with a median length of approximately 192 characters and mean having 592 characters. The distribution exhibited strong positive skewness, indicating the presence of tiles with exceptionally high text content, typically corresponding to densely developed urban areas.
+The quartile analysis identified several outliers, particularly in the upper range, where some tiles contained more than 1200 characters. These outliers primarily represented central business districts and major commercial zones, characterized by high concentrations of labeled buildings and points of interest. Conversely, tiles with minimal text content (below the lower quartile of 50 characters) often corresponded to natural areas or regions with limited development.
+Void Analysis and Verification
+Approximately 15% of tiles contained no text data, upon further investigation, it was found that these regions mainly belonged to:
+Water bodies (arabian sea and thane/vasai creek)
+Protected mangrove areas
+Undeveloped land parcels
+Text Content Analysis
+A frequency analysis of key terms and phrases across tiles provided insights into the characteristic vocabulary associated with different functional zones:
+Commercial zones showed high frequencies of terms related to retail, offices, and services
+Residential areas were characterized by apartment complexes, housing societies, and community facilities
+Industrial zones displayed consistent patterns of manufacturing, warehouse, and logistics-related terminology
+Natural areas were identified through references to parks, forests, and water bodies
+Preprocessing Impact Assessment
+The effect of text preprocessing steps was quantified through comparative analysis. Lemmatization reduced the unique token count by approximately 5%, while stopword removal decreased the total token count by 28%. These reductions improved the signal-to-noise ratio in the data while preserving essential semantic information for classification.
+This exploratory analysis provided essential insights that informed subsequent choices in our embedding and clustering methodology, particularly in handling outliers and setting appropriate thresholds for classification.
 6. embeddings - TFIDF, USE, Sentence Transformers (3 4 models)
+Text embedding is a technique that converts words, sentences, or documents into numerical vectors (sequences of numbers) that capture their semantic meaning. These vectors allow machines to understand and compare text mathematically - similar texts will have similar vector representations. This is fundamental for many NLP applications like search, recommendation systems, and text classification.
+Text embeddings transform text into numerical representations to capture semantic meaning. Traditional methods like TF-IDF (Term Frequency-Inverse Document Frequency) create sparse vectors based on word frequency, but they fail to understand context. In contrast, transformer-based embeddings generate dense, context-aware representations using deep learning. Notable models include Universal Sentence Encoder (USE) by Google, which provides efficient sentence embeddings for NLP tasks, and Sentence-Transformers (all-MiniLM-L6-v2).
+For the purpose of generating text embedding, three mehtods were chosen - tfidf, USE and sentence transformer.
 7. clustering - knee/elbow point
+K-Means – A popular centroid-based method that partitions embeddings into M clusters by minimizing intra-cluster variance. Efficient for large datasets but assumes clusters are spherical.
+DBSCAN (Density-Based Spatial Clustering of Applications with Noise) – Groups embeddings based on density, identifying arbitrary-shaped clusters while marking outliers. Works well for non-uniform distributions but requires fine-tuning of hyperparameters.
+HDBSCAN (Hierarchical DBSCAN) – An improvement over DBSCAN that adapts to varying density levels, making it effective for embeddings with different cluster densities.
+the elbow point detection method was used to identify the optimal k value in KMeans clustring. The Elbow Method is a technique for determining the optimal number of clusters (K) in K-Means clustering by analyzing the within-cluster sum of squares (WCSS), which measures how tightly data points are grouped within each cluster. As K increases, WCSS decreases because clusters become smaller and more refined. However, beyond a certain point, adding more clusters results in minimal improvement while increasing model complexity. By plotting WCSS against different values of K, an "elbow" shape typically appears, where the curve sharply bends before flattening out. The optimal K is chosen at this elbow point, as it represents the best balance between compact clusters and computational efficiency.
 8. 3D visual representation
+t-SNE (t-Distributed Stochastic Neighbor Embedding) is a dimensionality reduction technique commonly used for visualizing high-dimensional data in a lower-dimensional space (typically 2D or 3D). It preserves the local structure of data by converting pairwise similarities into probabilities, ensuring that similar points in high-dimensional space remain close in the lower-dimensional representation.
+UMAP (Uniform Manifold Approximation and Projection) is a non-linear dimensionality reduction technique designed for preserving both the local and global structure of high-dimensional data while being computationally efficient. Unlike t-SNE, which focuses primarily on local neighborhood preservation, UMAP constructs a graph-based representation of the data and optimizes a low-dimensional embedding using a probabilistic framework.
+these two methods were used to visualise the embeddings and the clusters generated to evaluate the quality of the differnt embedding approaches.
 9. auto labelling and comparison with manual coded values
+the clusters generates were used to label the data points into groups. 10% samples from each cluster were manually evaluated based on their text content, to map the cluster to either comm, res, indus or nat. this method effectively genrates a classification frameowrk for unalbelled data based on clustring.
+these auto generated labels were then comapred with the manually coded ground truth values, to identify the accuracy and other metrics as a classification framework.
 results

notebooks/Snetence_Transformers_embedding.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

notebooks/USE_EMB_FIX.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

notebooks/USE_embedding.ipynb CHANGED Viewed

@@ -2,9 +2,18 @@
  "cells": [
   {
    "cell_type": "code",
-   "execution_count": 13,
    "metadata": {},
-   "outputs": [],
    "source": [
     "import numpy as numpy\n",
     "import pandas as pd\n",
@@ -20,7 +29,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -32,7 +41,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
    "metadata": {},
    "outputs": [
     {
@@ -124,7 +133,7 @@
        "14           Uran Naval Base is a Landuse: Military;   "
       ]
      },
-     "execution_count": 7,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -137,14 +146,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Dataset size:  791\n"
      ]
     }
    ],
@@ -154,16 +163,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "(791,)"
       ]
      },
-     "execution_count": 10,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -175,39 +184,24 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
    "metadata": {},
    "outputs": [
     {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:From c:\\Users\\Akhil PC\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\tensorflow_hub\\resolver.py:120: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.\n",
-      "\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:From c:\\Users\\Akhil PC\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\tensorflow_hub\\resolver.py:120: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.\n",
-      "\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:From c:\\Users\\Akhil PC\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\tensorflow_hub\\module_v2.py:126: The name tf.saved_model.load_v2 is deprecated. Please use tf.compat.v2.saved_model.load instead.\n",
-      "\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:From c:\\Users\\Akhil PC\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\tensorflow_hub\\module_v2.py:126: The name tf.saved_model.load_v2 is deprecated. Please use tf.compat.v2.saved_model.load instead.\n",
-      "\n"
      ]
     }
    ],
@@ -218,7 +212,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -233,7 +227,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 25,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -243,7 +237,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
    "metadata": {},
    "outputs": [
     {
@@ -263,7 +257,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
    "metadata": {},
    "outputs": [
     {
@@ -472,7 +466,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 29,
    "metadata": {},
    "outputs": [
     {

  "cells": [
   {
    "cell_type": "code",
+   "execution_count": 1,
    "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "WARNING:tensorflow:From c:\\Users\\Akhil PC\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\tf_keras\\src\\losses.py:2976: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.\n",
+      "\n"
+     ]
+    }
+   ],
    "source": [
     "import numpy as numpy\n",
     "import pandas as pd\n",
   },
   {
    "cell_type": "code",
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
   },
   {
    "cell_type": "code",
+   "execution_count": 3,
    "metadata": {},
    "outputs": [
     {
        "14           Uran Naval Base is a Landuse: Military;   "
       ]
      },
+     "execution_count": 3,
      "metadata": {},
      "output_type": "execute_result"
     }
   },
   {
    "cell_type": "code",
+   "execution_count": 4,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
+      "Dataset size:  814\n"
      ]
     }
    ],
   },
   {
    "cell_type": "code",
+   "execution_count": 5,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
+       "(814,)"
       ]
      },
+     "execution_count": 5,
      "metadata": {},
      "output_type": "execute_result"
     }
   },
   {
    "cell_type": "code",
+   "execution_count": 7,
    "metadata": {},
    "outputs": [
     {
+     "ename": "KeyboardInterrupt",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[1;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
+      "Cell \u001b[1;32mIn[7], line 2\u001b[0m\n\u001b[0;32m      1\u001b[0m module_url \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mhttps://tfhub.dev/google/universal-sentence-encoder/4\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m----> 2\u001b[0m model \u001b[38;5;241m=\u001b[39m \u001b[43mhub\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodule_url\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[1;32mc:\\Users\\Akhil PC\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\tensorflow_hub\\module_v2.py:100\u001b[0m, in \u001b[0;36mload\u001b[1;34m(handle, tags, options)\u001b[0m\n\u001b[0;32m     98\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(handle, \u001b[38;5;28mstr\u001b[39m):\n\u001b[0;32m     99\u001b[0m   \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mExpected a string, got \u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;241m%\u001b[39m handle)\n\u001b[1;32m--> 100\u001b[0m module_path \u001b[38;5;241m=\u001b[39m \u001b[43mresolve\u001b[49m\u001b[43m(\u001b[49m\u001b[43mhandle\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    101\u001b[0m is_hub_module_v1 \u001b[38;5;241m=\u001b[39m tf\u001b[38;5;241m.\u001b[39mio\u001b[38;5;241m.\u001b[39mgfile\u001b[38;5;241m.\u001b[39mexists(_get_module_proto_path(module_path))\n\u001b[0;32m    102\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m tags \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;129;01mand\u001b[39;00m is_hub_module_v1:\n",
+      "File \u001b[1;32mc:\\Users\\Akhil PC\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\tensorflow_hub\\module_v2.py:55\u001b[0m, in \u001b[0;36mresolve\u001b[1;34m(handle)\u001b[0m\n\u001b[0;32m     31\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mresolve\u001b[39m(handle):\n\u001b[0;32m     32\u001b[0m \u001b[38;5;250m  \u001b[39m\u001b[38;5;124;03m\"\"\"Resolves a module handle into a path.\u001b[39;00m\n\u001b[0;32m     33\u001b[0m \n\u001b[0;32m     34\u001b[0m \u001b[38;5;124;03m  This function works both for plain TF2 SavedModels and the legacy TF1 Hub\u001b[39;00m\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m     53\u001b[0m \u001b[38;5;124;03m    A string representing the Module path.\u001b[39;00m\n\u001b[0;32m     54\u001b[0m \u001b[38;5;124;03m  \"\"\"\u001b[39;00m\n\u001b[1;32m---> 55\u001b[0m   \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mregistry\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mresolver\u001b[49m\u001b[43m(\u001b[49m\u001b[43mhandle\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[1;32mc:\\Users\\Akhil PC\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\tensorflow_hub\\registry.py:49\u001b[0m, in \u001b[0;36mMultiImplRegister.__call__\u001b[1;34m(self, *args, **kwargs)\u001b[0m\n\u001b[0;32m     47\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m impl \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mreversed\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_impls):\n\u001b[0;32m     48\u001b[0m   \u001b[38;5;28;01mif\u001b[39;00m impl\u001b[38;5;241m.\u001b[39mis_supported(\u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs):\n\u001b[1;32m---> 49\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mimpl\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m     50\u001b[0m   \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m     51\u001b[0m     fails\u001b[38;5;241m.\u001b[39mappend(\u001b[38;5;28mtype\u001b[39m(impl)\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m)\n",
+      "File \u001b[1;32mc:\\Users\\Akhil PC\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\tensorflow_hub\\compressed_module_resolver.py:81\u001b[0m, in \u001b[0;36mHttpCompressedFileResolver.__call__\u001b[1;34m(self, handle)\u001b[0m\n\u001b[0;32m     77\u001b[0m   response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_call_urlopen(request)\n\u001b[0;32m     78\u001b[0m   \u001b[38;5;28;01mreturn\u001b[39;00m resolver\u001b[38;5;241m.\u001b[39mDownloadManager(handle)\u001b[38;5;241m.\u001b[39mdownload_and_uncompress(\n\u001b[0;32m     79\u001b[0m       response, tmp_dir)\n\u001b[1;32m---> 81\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mresolver\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43matomic_download\u001b[49m\u001b[43m(\u001b[49m\u001b[43mhandle\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdownload\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mmodule_dir\u001b[49m\u001b[43m,\u001b[49m\n\u001b[0;32m     82\u001b[0m \u001b[43m                                \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_lock_file_timeout_sec\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[1;32mc:\\Users\\Akhil PC\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\tensorflow_hub\\resolver.py:411\u001b[0m, in \u001b[0;36matomic_download\u001b[1;34m(handle, download_fn, module_dir, lock_file_timeout_sec)\u001b[0m\n\u001b[0;32m    408\u001b[0m     \u001b[38;5;28;01mpass\u001b[39;00m\n\u001b[0;32m    410\u001b[0m   \u001b[38;5;66;03m# Wait for lock file to disappear.\u001b[39;00m\n\u001b[1;32m--> 411\u001b[0m   \u001b[43m_wait_for_lock_to_disappear\u001b[49m\u001b[43m(\u001b[49m\u001b[43mhandle\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mlock_file\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mlock_file_timeout_sec\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m    412\u001b[0m   \u001b[38;5;66;03m# At this point we either deleted a lock or a lock got removed by the\u001b[39;00m\n\u001b[0;32m    413\u001b[0m   \u001b[38;5;66;03m# owner or another process. Perform one more iteration of the while-loop,\u001b[39;00m\n\u001b[0;32m    414\u001b[0m   \u001b[38;5;66;03m# we would either terminate due tf.compat.v1.gfile.Exists(module_dir) or\u001b[39;00m\n\u001b[1;32m   (...)\u001b[0m\n\u001b[0;32m    417\u001b[0m \n\u001b[0;32m    418\u001b[0m \u001b[38;5;66;03m# Lock file acquired.\u001b[39;00m\n\u001b[0;32m    419\u001b[0m logging\u001b[38;5;241m.\u001b[39minfo(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mDownloading TF-Hub Module \u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;132;01m%s\u001b[39;00m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m.\u001b[39m\u001b[38;5;124m\"\u001b[39m, handle)\n",
+      "File \u001b[1;32mc:\\Users\\Akhil PC\\AppData\\Local\\Programs\\Python\\Python312\\Lib\\site-packages\\tensorflow_hub\\resolver.py:336\u001b[0m, in \u001b[0;36m_wait_for_lock_to_disappear\u001b[1;34m(handle, lock_file, lock_file_timeout_sec)\u001b[0m\n\u001b[0;32m    334\u001b[0m   \u001b[38;5;28;01mpass\u001b[39;00m\n\u001b[0;32m    335\u001b[0m \u001b[38;5;28;01mfinally\u001b[39;00m:\n\u001b[1;32m--> 336\u001b[0m   \u001b[43mtime\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msleep\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m5\u001b[39;49m\u001b[43m)\u001b[49m\n",
+      "\u001b[1;31mKeyboardInterrupt\u001b[0m: "
      ]
     }
    ],
   },
   {
    "cell_type": "code",
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
   },
   {
    "cell_type": "code",
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
   },
   {
    "cell_type": "code",
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
   },
   {
    "cell_type": "code",
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {
   },
   {
    "cell_type": "code",
+   "execution_count": null,
    "metadata": {},
    "outputs": [
     {

notebooks/evaluation.ipynb ADDED Viewed

	@@ -0,0 +1,239 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "from sklearn.metrics import classification_report, accuracy_score, confusion_matrix, precision_score, recall_score, f1_score"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>row</th>\n",
+       "      <th>col</th>\n",
+       "      <th>latitude</th>\n",
+       "      <th>longitude</th>\n",
+       "      <th>Map Data</th>\n",
+       "      <th>label</th>\n",
+       "      <th>label_ground</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>0</td>\n",
+       "      <td>1</td>\n",
+       "      <td>18.89433</td>\n",
+       "      <td>72.794102</td>\n",
+       "      <td>Prongs Reef is a Natural;</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>0</td>\n",
+       "      <td>2</td>\n",
+       "      <td>18.89433</td>\n",
+       "      <td>72.803607</td>\n",
+       "      <td>United Services Club Golf Course is a Leisure ...</td>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>0</td>\n",
+       "      <td>3</td>\n",
+       "      <td>18.89433</td>\n",
+       "      <td>72.813112</td>\n",
+       "      <td>Indian Meterological Department is a Commercia...</td>\n",
+       "      <td>4</td>\n",
+       "      <td>4</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>0</td>\n",
+       "      <td>13</td>\n",
+       "      <td>18.89433</td>\n",
+       "      <td>72.908163</td>\n",
+       "      <td>Uran Naval Base is a Landuse: Military;</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>0</td>\n",
+       "      <td>14</td>\n",
+       "      <td>18.89433</td>\n",
+       "      <td>72.917668</td>\n",
+       "      <td>Uran Naval Base is a Landuse: Military;</td>\n",
+       "      <td>3</td>\n",
+       "      <td>3</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   row  col  latitude  longitude  \\\n",
+       "0    0    1  18.89433  72.794102   \n",
+       "1    0    2  18.89433  72.803607   \n",
+       "2    0    3  18.89433  72.813112   \n",
+       "3    0   13  18.89433  72.908163   \n",
+       "4    0   14  18.89433  72.917668   \n",
+       "\n",
+       "                                            Map Data  label  label_ground  \n",
+       "0                         Prongs Reef is a Natural;       2             2  \n",
+       "1  United Services Club Golf Course is a Leisure ...      2             2  \n",
+       "2  Indian Meterological Department is a Commercia...      4             4  \n",
+       "3           Uran Naval Base is a Landuse: Military;       3             3  \n",
+       "4           Uran Naval Base is a Landuse: Military;       3             3  "
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df = pd.read_excel(r'C:\\Users\\Akhil PC\\Documents\\projects\\research\\Marauders-Map\\data\\MMR_DATA_CLEAN_LABELLED_GROUND.xlsx')\n",
+    "df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y_pred = df['label']\n",
+    "y_true = df['label_ground']\n",
+    "\n",
+    "if y_pred.shape != y_true.shape:\n",
+    "    print(\"Error: The shape of the prediction and true labels do not match.\")\n",
+    "    exit(1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy:  0.9705159705159705\n",
+      "Precision:  0.9706854194661688\n",
+      "Recall:  0.9705159705159705\n",
+      "F1 Score:  0.9704914288808899\n"
+     ]
+    }
+   ],
+   "source": [
+    "## accuracy, precision, recall, f1 score\n",
+    "\n",
+    "accuracy = accuracy_score(y_true, y_pred)\n",
+    "precision = precision_score(y_true, y_pred, average='weighted')\n",
+    "recall = recall_score(y_true, y_pred, average='weighted')\n",
+    "f1 = f1_score(y_true, y_pred, average='weighted')\n",
+    "\n",
+    "print(\"Accuracy: \", accuracy)\n",
+    "print(\"Precision: \", precision)\n",
+    "print(\"Recall: \", recall)\n",
+    "print(\"F1 Score: \", f1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<matplotlib.image.AxesImage at 0x1c9fb0476b0>"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZoAAAGkCAYAAAAIduO+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8g+/7EAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAPRklEQVR4nO3db4hV9Z/A8c+ozFjoDCuWNjszJSs/F2u1XVORIOyfIovYs578tsEiKDQQHzVPsn0QEwShlJgslewukhSYEPSP+pkEWTmuoEZB4INpzawnc8eBRn8zdzmHdfbn/io1/dw7c+/rBYfx3u54P5zGec/3nHPvtFSr1WoAQJJpWX8xABSEBoBUQgNAKqEBIJXQAJBKaABIJTQApBIaAFIJDQCphAaAVEJzhXbu3Bm33XZbzJw5M1auXBlffPFFvUdqWIcOHYr169dHZ2dntLS0xNtvv13vkRpef39/LF++PGbPnh0333xzPPTQQ/HNN9/Ue6yGtmvXrliyZEm0t7eX26pVq+Ldd9+NRiQ0V2Dfvn2xdevW2LZtWxw9ejSWLl0aa9eujbNnz9Z7tIY0MjJS7uMi7tTGJ598Eps2bYrDhw/Hhx9+GBcuXIg1a9aU/y/I0dXVFc8//3wMDAzEkSNH4r777osNGzbEyZMno9G0eFPNyytWMMVPey+//HJ5e3x8PLq7u+Opp56Kp59+ut7jNbRiRbN///7yJ2xq58cffyxXNkWA7rnnnnqP0zTmzJkTL7zwQjz22GPRSKxoLuP8+fPlTxwPPPDAxH3Tpk0rb3/22Wd1nQ2yDA0NTXzjI9/Y2Fi88cYb5QqyOITWaGbUe4DJ7qeffiq/CObNm3fJ/cXtr7/+um5zQZZixb5ly5a4++6744477qj3OA3t+PHjZVh+/vnnmDVrVrl6X7x4cTQaoQEuUZyrOXHiRHz66af1HqXhLVq0KI4dO1auIN96663o7e0tD1c2WmyE5jLmzp0b06dPjx9++OGS+4vb8+fPr9tckGHz5s3xzjvvlFf+FSerydXa2hoLFy4s/7xs2bL48ssvY8eOHbF79+5oJM7RXMEXQvEF8NFHH11yaKG43YjHUmlOxTVBRWSKQzcff/xxLFiwoN4jNaXx8fEYHR2NRmNFcwWKS5uLJe1dd90VK1asiO3bt5cn7TZu3Fjv0RrSuXPn4ttvv524ferUqfLwQnFiuqenp66zNfLhsr1798aBAwfK19KcOXOmvL+joyNuuOGGeo/XkPr6+mLdunXl1/Tw8HC5/w8ePBjvv/9+NJzi8mYu76WXXqr29PRUW1tbqytWrKgePny43iM1rD/96U/FJfd/tfX29tZ7tIb1S/u72F5//fV6j9awHn300eqtt95afk+56aabqvfff3/1gw8+qDYir6MBIJVzNACkEhoAUgkNAKmEBoBUQgNAKqEBIJXQXKHi1brPPvtsQ75qd7Kyz2vPPq+90SbY515Hc4UqlUr5Kunize+K34ZHPvu89uzz2qs0wT63ogEgldAA0Fhvqlm8O+np06fLN+4rfk3vVFre/uVH8tnntWef115lCu/z4sxL8YagnZ2d5W8enjTnaL777rvo7u6u5VMCkGhwcPA3f39RzVc0xUqmsPHfPo7WG2fV+umb1r+u/UO9R2g6U2nFDr/HcKUSCxd0T3xfnzShufiPr4iM0NROo17NMpkJDc2i5TJf6y4GACCV0ACQSmgASCU0AKQSGgBSCQ0AqYQGgFRCA0AqoQEgldAAkEpoAEglNACkEhoAUgkNAKmEBoBUQgNAKqEBIJXQAJBKaABIJTQApBIaAFIJDQCphAaAVEIDQCqhASCV0ACQSmgASCU0AKQSGgBSCQ0AqYQGgFRCA0AqoQEgldAAkEpoAEglNACkEhoAUgkNAKmEBoBUQgPA5AvNzp0747bbbouZM2fGypUr44svvrj+kwHQnKHZt29fbN26NbZt2xZHjx6NpUuXxtq1a+Ps2bM5EwLQXKF58cUX4/HHH4+NGzfG4sWL45VXXokbb7wxXnvttZwJAWie0Jw/fz4GBgbigQce+L+/YNq08vZnn332i58zOjoalUrlkg2A5nFVofnpp59ibGws5s2bd8n9xe0zZ8784uf09/dHR0fHxNbd3X1tEwMwpaRfddbX1xdDQ0MT2+DgYPZTAjCJzLiaB8+dOzemT58eP/zwwyX3F7fnz5//i5/T1tZWbgA0p6ta0bS2tsayZcvio48+mrhvfHy8vL1q1aqM+QBophVNobi0ube3N+66665YsWJFbN++PUZGRsqr0ADgmkPz8MMPx48//hjPPPNMeQHAnXfeGe+9995fXSAAAL8rNIXNmzeXGwBcjvc6AyCV0ACQSmgASCU0AKQSGgBSCQ0AqYQGgFRCA0AqoQEgldAAkEpoAEglNACkEhoAUgkNAKmEBoBUQgNAKqEBIJXQAJBKaABIJTQApBIaAFIJDQCphAaAVEIDQCqhASCV0ACQSmgASCU0AKQSGgBSCQ0AqYQGgFRCA0AqoQEgldAAkEpoAEglNACkEhoAUgkNAKmEBoBUM6JOnl3zh2hvb6/X0zedR/7zv+o9QtP5j3/5p3qPAJOCFQ0AqYQGgFRCA0AqoQEgldAAkEpoAEglNACkEhoAUgkNAKmEBoBUQgNAKqEBIJXQAJBKaABIJTQApBIaAFIJDQCphAaAVEIDQCqhASCV0ACQSmgASCU0AKQSGgBSCQ0AqYQGgFRCA0AqoQEgldAAkEpoAEglNACkEhoAUgkNAKmEBoBUQgNAKqEBIJXQAJBKaABIJTQApBIaAFIJDQCphAaAyRWaQ4cOxfr166OzszNaWlri7bffzpkMgOYMzcjISCxdujR27tyZMxEADWXG1X7CunXryg0AUkJztUZHR8vtokqlkv2UADTTxQD9/f3R0dExsXV3d2c/JQDNFJq+vr4YGhqa2AYHB7OfEoBmOnTW1tZWbgA0J6+jAWByrWjOnTsX33777cTtU6dOxbFjx2LOnDnR09NzvecDoNlCc+TIkbj33nsnbm/durX82NvbG3v27Lm+0wHQfKFZvXp1VKvVnGkAaDjO0QCQSmgASCU0AKQSGgBSCQ0AqYQGgFRCA0AqoQEgldAAkEpoAEglNACkEhoAUgkNAKmEBoBUQgNAKqEBIJXQAJBKaABIJTQApBIaAFIJDQCphAaAVEIDQCqhASCV0ACQSmgASCU0AKQSGgBSCQ0AqYQGgFRCA0AqoQEgldAAkEpoAEglNACkEhoAUgkNAKmEBoBUQgNAqhlRJ9OmtZQbtfHvf/zHeo/QdA4c/+96j9B0/nnxLfUeoan8eWz8ih5nRQNAKqEBIJXQAJBKaABIJTQApBIaAFIJDQCphAaAVEIDQCqhASCV0ACQSmgASCU0AKQSGgBSCQ0AqYQGgFRCA0AqoQEgldAAkEpoAEglNACkEhoAUgkNAKmEBoBUQgNAKqEBIJXQAJBKaABIJTQApBIaAFIJDQCphAaAVEIDQCqhASCV0ACQSmgASCU0AKQSGgBSCQ0AqYQGgFRCA0AqoQFg8oSmv78/li9fHrNnz46bb745Hnroofjmm2/ypgOguULzySefxKZNm+Lw4cPx4YcfxoULF2LNmjUxMjKSNyEAU9qMq3nwe++9d8ntPXv2lCubgYGBuOeee673bAA0W2j+v6GhofLjnDlzfvUxo6Oj5XZRpVK5lqcEoFkuBhgfH48tW7bE3XffHXfcccdvntfp6OiY2Lq7u3/vUwLQTKEpztWcOHEi3njjjd98XF9fX7nyubgNDg7+3qcEoFkOnW3evDneeeedOHToUHR1df3mY9va2soNgOZ0VaGpVqvx1FNPxf79++PgwYOxYMGCvMkAaL7QFIfL9u7dGwcOHChfS3PmzJny/uLcyw033JA1IwDNco5m165d5XmW1atXxy233DKx7du3L29CAJrr0BkAXA3vdQZAKqEBIJXQAJBKaABIJTQApBIaAFIJDQCphAaAVEIDQCqhASCV0ACQSmgASCU0AKQSGgBSCQ0AqYQGgFRCA0AqoQEgldAAkEpoAEglNACkEhoAUgkNAKmEBoBUQgNAKqEBIJXQAJBKaABIJTQApBIaAFIJDQCphAaAVEIDQCqhASCV0ACQSmgASCU0AKQSGgBSCQ0AqWZEnfx5bLzcqI0Z0/1MUWsb/uFv6z1C0/mb5ZvrPUJTqY6dv6LH+e4DQCqhASCV0ACQSmgASCU0AKQSGgBSCQ0AqYQGgFRCA0AqoQEgldAAkEpoAEglNACkEhoAUgkNAKmEBoBUQgNAKqEBIJXQAJBKaABIJTQApBIaAFIJDQCphAaAVEIDQCqhASCV0ACQSmgASCU0AKQSGgBSCQ0AqYQGgFRCA0AqoQEgldAAkEpoAEglNACkEhoAUgkNAKmEBoBUQgNAKqEBYPKEZteuXbFkyZJob28vt1WrVsW7776bNx0AzRWarq6ueP7552NgYCCOHDkS9913X2zYsCFOnjyZNyEAU9qMq3nw+vXrL7n93HPPlaucw4cPx+233369ZwOg2ULzl8bGxuLNN9+MkZGR8hDarxkdHS23iyqVyu99SgCa4WKA48ePx6xZs6KtrS2eeOKJ2L9/fyxevPhXH9/f3x8dHR0TW3d397XODEAjh2bRokVx7Nix+Pzzz+PJJ5+M3t7e+Oqrr3718X19fTE0NDSxDQ4OXuvMADTyobPW1tZYuHBh+edly5bFl19+GTt27Ijdu3f/4uOLlU+xAdCcrvl1NOPj45ecgwGA372iKQ6DrVu3Lnp6emJ4eDj27t0bBw8ejPfff/9q/hoAmshVhebs2bPxyCOPxPfff1+e2C9evFlE5sEHH8ybEIDmCc2rr76aNwkADcl7nQGQSmgASCU0AKQSGgBSCQ0AqYQGgFRCA0AqoQEgldAAkEpoAEglNACkEhoAUgkNAKmEBoBUQgNAKqEBIJXQAJBKaABIJTQApBIaAFIJDQCphAaAVEIDQCqhASCV0ACQSmgASCU0AKQSGgBSCQ0AqYQGgFRCA0AqoQEgldAAkEpoAEglNACkEhoAUgkNAKmEBoBUM6LGqtVq+XF4uFLrp25qM6b7mYLGVx07X+8RmnJ/V//3+/qkCc3w8HD58e//7tZaPzUASd/XOzo6fvW/t1Qvl6LrbHx8PE6fPh2zZ8+OlpaWmCoqlUp0d3fH4OBgtLe313ucpmCf1559XnuVKbzPi3wUkens7Ixp06ZNnhVNMUxXV1dMVcUXwlT7Ypjq7PPas89rr32K7vPfWslc5MA9AKmEBoBUQnOF2traYtu2beVHasM+rz37vPbammCf1/xiAACaixUNAKmEBoBUQgNAKqEBIJXQAJBKaABIJTQApBIaACLT/wDbpWJbXYS9aAAAAABJRU5ErkJggg==",
+      "text/plain": [
+       "<Figure size 480x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "## confusion matrix\n",
+    "\n",
+    "conf_matrix = confusion_matrix(y_true, y_pred)\n",
+    "\n",
+    "plt.matshow(conf_matrix, cmap='Blues')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

src/map_slice.html ADDED Viewed

	@@ -0,0 +1,92 @@

+<!DOCTYPE html>
+<html>
+<head>
+    <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
+        <script>
+            L_NO_TOUCH = false;
+            L_DISABLE_3D = false;
+        </script>
+    <style>html, body {width: 100%;height: 100%;margin: 0;padding: 0;}</style>
+    <style>#map {position:absolute;top:0;bottom:0;right:0;left:0;}</style>
+    <script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/leaflet.js"></script>
+    <script src="https://code.jquery.com/jquery-3.7.1.min.js"></script>
+    <script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js"></script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/Leaflet.awesome-markers/2.0.2/leaflet.awesome-markers.js"></script>
+    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/leaflet.css"/>
+    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css"/>
+    <link rel="stylesheet" href="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/css/bootstrap-glyphicons.css"/>
+    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@fortawesome/[email protected]/css/all.min.css"/>
+    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/Leaflet.awesome-markers/2.0.2/leaflet.awesome-markers.css"/>
+    <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/python-visualization/folium/folium/templates/leaflet.awesome.rotate.min.css"/>
+            <meta name="viewport" content="width=device-width,
+                initial-scale=1.0, maximum-scale=1.0, user-scalable=no" />
+            <style>
+                #map_3fb1c91a19eaea7a0cc92fdc6aa05510 {
+                    position: relative;
+                    width: 100.0%;
+                    height: 100.0%;
+                    left: 0.0%;
+                    top: 0.0%;
+                }
+                .leaflet-container { font-size: 1rem; }
+            </style>
+</head>
+<body>
+            <div class="folium-map" id="map_3fb1c91a19eaea7a0cc92fdc6aa05510" ></div>
+</body>
+<script>
+            var map_3fb1c91a19eaea7a0cc92fdc6aa05510 = L.map(
+                "map_3fb1c91a19eaea7a0cc92fdc6aa05510",
+                {
+                    center: [19.04721441397017, 72.94618401786974],
+                    crs: L.CRS.EPSG3857,
+                    ...{
+  "zoom": 12,
+  "zoomControl": true,
+  "preferCanvas": false,
+}
+                }
+            );
+            var tile_layer_ca6a7e1da1b1e99f61c2e94424fed6cf = L.tileLayer(
+                "https://tile.openstreetmap.org/{z}/{x}/{y}.png",
+                {
+  "minZoom": 0,
+  "maxZoom": 19,
+  "maxNativeZoom": 19,
+  "noWrap": false,
+  "attribution": "\u0026copy; \u003ca href=\"https://www.openstreetmap.org/copyright\"\u003eOpenStreetMap\u003c/a\u003e contributors",
+  "subdomains": "abc",
+  "detectRetina": false,
+  "tms": false,
+  "opacity": 1,
+}
+            );
+            tile_layer_ca6a7e1da1b1e99f61c2e94424fed6cf.addTo(map_3fb1c91a19eaea7a0cc92fdc6aa05510);
+            var rectangle_3f015643e8201922d4eaf484aa326e44 = L.rectangle(
+                [[18.889833, 72.779844], [19.20459582794034, 73.11252403573947]],
+                {"bubblingMouseEvents": true, "color": "blue", "dashArray": null, "dashOffset": null, "fill": true, "fillColor": "blue", "fillOpacity": 0.2, "fillRule": "evenodd", "lineCap": "round", "lineJoin": "round", "noClip": false, "opacity": 1.0, "smoothFactor": 1.0, "stroke": true, "weight": 3}
+            ).addTo(map_3fb1c91a19eaea7a0cc92fdc6aa05510);
+</script>
+</html>

src/mapslice.py ADDED Viewed

	@@ -0,0 +1,45 @@

+import folium
+from math import radians, cos, sin, sqrt, atan2
+def haversine(lat, lon, distance_km):
+    # Earth's radius in km
+    R = 6371.0
+    # Approximate bounding box (assuming small area, ignoring curvature)
+    d_lat = distance_km / R
+    d_lon = distance_km / (R * cos(radians(lat)))
+    # Top-right coordinates
+    new_lat = lat + (d_lat * (180 / 3.14159))
+    new_lon = lon + (d_lon * (180 / 3.14159))
+    return new_lat, new_lon
+def generate_map(bottom_left_lat, bottom_left_lon, distance_km):
+    top_right_lat, top_right_lon = haversine(bottom_left_lat, bottom_left_lon, distance_km)
+    # Center of map (approx midpoint)
+    center_lat = (bottom_left_lat + top_right_lat) / 2
+    center_lon = (bottom_left_lon + top_right_lon) / 2
+    # Create a folium map
+    map_slice = folium.Map(location=[center_lat, center_lon], zoom_start=12)
+    # Draw a bounding box
+    folium.Rectangle(
+        bounds=[(bottom_left_lat, bottom_left_lon), (top_right_lat, top_right_lon)],
+        color="blue",
+        fill=True,
+        fill_opacity=0.2
+    ).add_to(map_slice)
+    # Save map to file
+    map_slice.save("map_slice.html")
+    print("Map saved as 'map_slice.html'. Open it in a browser.")
+if __name__ == "__main__":
+    lat = float(input("Enter bottom-left latitude: "))
+    lon = float(input("Enter bottom-left longitude: "))
+    distance_km = float(input("Enter distance (km): "))
+    generate_map(lat, lon, distance_km)