VesperAI commited on
Commit
8348919
·
1 Parent(s): 15deac4

addede a Production Branch

Browse files
.gitignore ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ data
2
+ qdrant_data
3
+ config
4
+ __pycache__
5
+ *.db
6
+ .venv
7
+ .env
Dockerfile ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.9-slim
2
+
3
+ WORKDIR /code
4
+
5
+ COPY ./requirements.txt /code/requirements.txt
6
+
7
+ RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
8
+
9
+ COPY . /code
10
+
11
+ EXPOSE 7860
12
+
13
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
LICENSE ADDED
@@ -0,0 +1,373 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Mozilla Public License Version 2.0
2
+ ==================================
3
+
4
+ 1. Definitions
5
+ --------------
6
+
7
+ 1.1. "Contributor"
8
+ means each individual or legal entity that creates, contributes to
9
+ the creation of, or owns Covered Software.
10
+
11
+ 1.2. "Contributor Version"
12
+ means the combination of the Contributions of others (if any) used
13
+ by a Contributor and that particular Contributor's Contribution.
14
+
15
+ 1.3. "Contribution"
16
+ means Covered Software of a particular Contributor.
17
+
18
+ 1.4. "Covered Software"
19
+ means Source Code Form to which the initial Contributor has attached
20
+ the notice in Exhibit A, the Executable Form of such Source Code
21
+ Form, and Modifications of such Source Code Form, in each case
22
+ including portions thereof.
23
+
24
+ 1.5. "Incompatible With Secondary Licenses"
25
+ means
26
+
27
+ (a) that the initial Contributor has attached the notice described
28
+ in Exhibit B to the Covered Software; or
29
+
30
+ (b) that the Covered Software was made available under the terms of
31
+ version 1.1 or earlier of the License, but not also under the
32
+ terms of a Secondary License.
33
+
34
+ 1.6. "Executable Form"
35
+ means any form of the work other than Source Code Form.
36
+
37
+ 1.7. "Larger Work"
38
+ means a work that combines Covered Software with other material, in
39
+ a separate file or files, that is not Covered Software.
40
+
41
+ 1.8. "License"
42
+ means this document.
43
+
44
+ 1.9. "Licensable"
45
+ means having the right to grant, to the maximum extent possible,
46
+ whether at the time of the initial grant or subsequently, any and
47
+ all of the rights conveyed by this License.
48
+
49
+ 1.10. "Modifications"
50
+ means any of the following:
51
+
52
+ (a) any file in Source Code Form that results from an addition to,
53
+ deletion from, or modification of the contents of Covered
54
+ Software; or
55
+
56
+ (b) any new file in Source Code Form that contains any Covered
57
+ Software.
58
+
59
+ 1.11. "Patent Claims" of a Contributor
60
+ means any patent claim(s), including without limitation, method,
61
+ process, and apparatus claims, in any patent Licensable by such
62
+ Contributor that would be infringed, but for the grant of the
63
+ License, by the making, using, selling, offering for sale, having
64
+ made, import, or transfer of either its Contributions or its
65
+ Contributor Version.
66
+
67
+ 1.12. "Secondary License"
68
+ means either the GNU General Public License, Version 2.0, the GNU
69
+ Lesser General Public License, Version 2.1, the GNU Affero General
70
+ Public License, Version 3.0, or any later versions of those
71
+ licenses.
72
+
73
+ 1.13. "Source Code Form"
74
+ means the form of the work preferred for making modifications.
75
+
76
+ 1.14. "You" (or "Your")
77
+ means an individual or a legal entity exercising rights under this
78
+ License. For legal entities, "You" includes any entity that
79
+ controls, is controlled by, or is under common control with You. For
80
+ purposes of this definition, "control" means (a) the power, direct
81
+ or indirect, to cause the direction or management of such entity,
82
+ whether by contract or otherwise, or (b) ownership of more than
83
+ fifty percent (50%) of the outstanding shares or beneficial
84
+ ownership of such entity.
85
+
86
+ 2. License Grants and Conditions
87
+ --------------------------------
88
+
89
+ 2.1. Grants
90
+
91
+ Each Contributor hereby grants You a world-wide, royalty-free,
92
+ non-exclusive license:
93
+
94
+ (a) under intellectual property rights (other than patent or trademark)
95
+ Licensable by such Contributor to use, reproduce, make available,
96
+ modify, display, perform, distribute, and otherwise exploit its
97
+ Contributions, either on an unmodified basis, with Modifications, or
98
+ as part of a Larger Work; and
99
+
100
+ (b) under Patent Claims of such Contributor to make, use, sell, offer
101
+ for sale, have made, import, and otherwise transfer either its
102
+ Contributions or its Contributor Version.
103
+
104
+ 2.2. Effective Date
105
+
106
+ The licenses granted in Section 2.1 with respect to any Contribution
107
+ become effective for each Contribution on the date the Contributor first
108
+ distributes such Contribution.
109
+
110
+ 2.3. Limitations on Grant Scope
111
+
112
+ The licenses granted in this Section 2 are the only rights granted under
113
+ this License. No additional rights or licenses will be implied from the
114
+ distribution or licensing of Covered Software under this License.
115
+ Notwithstanding Section 2.1(b) above, no patent license is granted by a
116
+ Contributor:
117
+
118
+ (a) for any code that a Contributor has removed from Covered Software;
119
+ or
120
+
121
+ (b) for infringements caused by: (i) Your and any other third party's
122
+ modifications of Covered Software, or (ii) the combination of its
123
+ Contributions with other software (except as part of its Contributor
124
+ Version); or
125
+
126
+ (c) under Patent Claims infringed by Covered Software in the absence of
127
+ its Contributions.
128
+
129
+ This License does not grant any rights in the trademarks, service marks,
130
+ or logos of any Contributor (except as may be necessary to comply with
131
+ the notice requirements in Section 3.4).
132
+
133
+ 2.4. Subsequent Licenses
134
+
135
+ No Contributor makes additional grants as a result of Your choice to
136
+ distribute the Covered Software under a subsequent version of this
137
+ License (see Section 10.2) or under the terms of a Secondary License (if
138
+ permitted under the terms of Section 3.3).
139
+
140
+ 2.5. Representation
141
+
142
+ Each Contributor represents that the Contributor believes its
143
+ Contributions are its original creation(s) or it has sufficient rights
144
+ to grant the rights to its Contributions conveyed by this License.
145
+
146
+ 2.6. Fair Use
147
+
148
+ This License is not intended to limit any rights You have under
149
+ applicable copyright doctrines of fair use, fair dealing, or other
150
+ equivalents.
151
+
152
+ 2.7. Conditions
153
+
154
+ Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted
155
+ in Section 2.1.
156
+
157
+ 3. Responsibilities
158
+ -------------------
159
+
160
+ 3.1. Distribution of Source Form
161
+
162
+ All distribution of Covered Software in Source Code Form, including any
163
+ Modifications that You create or to which You contribute, must be under
164
+ the terms of this License. You must inform recipients that the Source
165
+ Code Form of the Covered Software is governed by the terms of this
166
+ License, and how they can obtain a copy of this License. You may not
167
+ attempt to alter or restrict the recipients' rights in the Source Code
168
+ Form.
169
+
170
+ 3.2. Distribution of Executable Form
171
+
172
+ If You distribute Covered Software in Executable Form then:
173
+
174
+ (a) such Covered Software must also be made available in Source Code
175
+ Form, as described in Section 3.1, and You must inform recipients of
176
+ the Executable Form how they can obtain a copy of such Source Code
177
+ Form by reasonable means in a timely manner, at a charge no more
178
+ than the cost of distribution to the recipient; and
179
+
180
+ (b) You may distribute such Executable Form under the terms of this
181
+ License, or sublicense it under different terms, provided that the
182
+ license for the Executable Form does not attempt to limit or alter
183
+ the recipients' rights in the Source Code Form under this License.
184
+
185
+ 3.3. Distribution of a Larger Work
186
+
187
+ You may create and distribute a Larger Work under terms of Your choice,
188
+ provided that You also comply with the requirements of this License for
189
+ the Covered Software. If the Larger Work is a combination of Covered
190
+ Software with a work governed by one or more Secondary Licenses, and the
191
+ Covered Software is not Incompatible With Secondary Licenses, this
192
+ License permits You to additionally distribute such Covered Software
193
+ under the terms of such Secondary License(s), so that the recipient of
194
+ the Larger Work may, at their option, further distribute the Covered
195
+ Software under the terms of either this License or such Secondary
196
+ License(s).
197
+
198
+ 3.4. Notices
199
+
200
+ You may not remove or alter the substance of any license notices
201
+ (including copyright notices, patent notices, disclaimers of warranty,
202
+ or limitations of liability) contained within the Source Code Form of
203
+ the Covered Software, except that You may alter any license notices to
204
+ the extent required to remedy known factual inaccuracies.
205
+
206
+ 3.5. Application of Additional Terms
207
+
208
+ You may choose to offer, and to charge a fee for, warranty, support,
209
+ indemnity or liability obligations to one or more recipients of Covered
210
+ Software. However, You may do so only on Your own behalf, and not on
211
+ behalf of any Contributor. You must make it absolutely clear that any
212
+ such warranty, support, indemnity, or liability obligation is offered by
213
+ You alone, and You hereby agree to indemnify every Contributor for any
214
+ liability incurred by such Contributor as a result of warranty, support,
215
+ indemnity or liability terms You offer. You may include additional
216
+ disclaimers of warranty and limitations of liability specific to any
217
+ jurisdiction.
218
+
219
+ 4. Inability to Comply Due to Statute or Regulation
220
+ ---------------------------------------------------
221
+
222
+ If it is impossible for You to comply with any of the terms of this
223
+ License with respect to some or all of the Covered Software due to
224
+ statute, judicial order, or regulation then You must: (a) comply with
225
+ the terms of this License to the maximum extent possible; and (b)
226
+ describe the limitations and the code they affect. Such description must
227
+ be placed in a text file included with all distributions of the Covered
228
+ Software under this License. Except to the extent prohibited by statute
229
+ or regulation, such description must be sufficiently detailed for a
230
+ recipient of ordinary skill to be able to understand it.
231
+
232
+ 5. Termination
233
+ --------------
234
+
235
+ 5.1. The rights granted under this License will terminate automatically
236
+ if You fail to comply with any of its terms. However, if You become
237
+ compliant, then the rights granted under this License from a particular
238
+ Contributor are reinstated (a) provisionally, unless and until such
239
+ Contributor explicitly and finally terminates Your grants, and (b) on an
240
+ ongoing basis, if such Contributor fails to notify You of the
241
+ non-compliance by some reasonable means prior to 60 days after You have
242
+ come back into compliance. Moreover, Your grants from a particular
243
+ Contributor are reinstated on an ongoing basis if such Contributor
244
+ notifies You of the non-compliance by some reasonable means, this is the
245
+ first time You have received notice of non-compliance with this License
246
+ from such Contributor, and You become compliant prior to 30 days after
247
+ Your receipt of the notice.
248
+
249
+ 5.2. If You initiate litigation against any entity by asserting a patent
250
+ infringement claim (excluding declaratory judgment actions,
251
+ counter-claims, and cross-claims) alleging that a Contributor Version
252
+ directly or indirectly infringes any patent, then the rights granted to
253
+ You by any and all Contributors for the Covered Software under Section
254
+ 2.1 of this License shall terminate.
255
+
256
+ 5.3. In the event of termination under Sections 5.1 or 5.2 above, all
257
+ end user license agreements (excluding distributors and resellers) which
258
+ have been validly granted by You or Your distributors under this License
259
+ prior to termination shall survive termination.
260
+
261
+ ************************************************************************
262
+ * *
263
+ * 6. Disclaimer of Warranty *
264
+ * ------------------------- *
265
+ * *
266
+ * Covered Software is provided under this License on an "as is" *
267
+ * basis, without warranty of any kind, either expressed, implied, or *
268
+ * statutory, including, without limitation, warranties that the *
269
+ * Covered Software is free of defects, merchantable, fit for a *
270
+ * particular purpose or non-infringing. The entire risk as to the *
271
+ * quality and performance of the Covered Software is with You. *
272
+ * Should any Covered Software prove defective in any respect, You *
273
+ * (not any Contributor) assume the cost of any necessary servicing, *
274
+ * repair, or correction. This disclaimer of warranty constitutes an *
275
+ * essential part of this License. No use of any Covered Software is *
276
+ * authorized under this License except under this disclaimer. *
277
+ * *
278
+ ************************************************************************
279
+
280
+ ************************************************************************
281
+ * *
282
+ * 7. Limitation of Liability *
283
+ * -------------------------- *
284
+ * *
285
+ * Under no circumstances and under no legal theory, whether tort *
286
+ * (including negligence), contract, or otherwise, shall any *
287
+ * Contributor, or anyone who distributes Covered Software as *
288
+ * permitted above, be liable to You for any direct, indirect, *
289
+ * special, incidental, or consequential damages of any character *
290
+ * including, without limitation, damages for lost profits, loss of *
291
+ * goodwill, work stoppage, computer failure or malfunction, or any *
292
+ * and all other commercial damages or losses, even if such party *
293
+ * shall have been informed of the possibility of such damages. This *
294
+ * limitation of liability shall not apply to liability for death or *
295
+ * personal injury resulting from such party's negligence to the *
296
+ * extent applicable law prohibits such limitation. Some *
297
+ * jurisdictions do not allow the exclusion or limitation of *
298
+ * incidental or consequential damages, so this exclusion and *
299
+ * limitation may not apply to You. *
300
+ * *
301
+ ************************************************************************
302
+
303
+ 8. Litigation
304
+ -------------
305
+
306
+ Any litigation relating to this License may be brought only in the
307
+ courts of a jurisdiction where the defendant maintains its principal
308
+ place of business and such litigation shall be governed by laws of that
309
+ jurisdiction, without reference to its conflict-of-law provisions.
310
+ Nothing in this Section shall prevent a party's ability to bring
311
+ cross-claims or counter-claims.
312
+
313
+ 9. Miscellaneous
314
+ ----------------
315
+
316
+ This License represents the complete agreement concerning the subject
317
+ matter hereof. If any provision of this License is held to be
318
+ unenforceable, such provision shall be reformed only to the extent
319
+ necessary to make it enforceable. Any law or regulation which provides
320
+ that the language of a contract shall be construed against the drafter
321
+ shall not be used to construe this License against a Contributor.
322
+
323
+ 10. Versions of the License
324
+ ---------------------------
325
+
326
+ 10.1. New Versions
327
+
328
+ Mozilla Foundation is the license steward. Except as provided in Section
329
+ 10.3, no one other than the license steward has the right to modify or
330
+ publish new versions of this License. Each version will be given a
331
+ distinguishing version number.
332
+
333
+ 10.2. Effect of New Versions
334
+
335
+ You may distribute the Covered Software under the terms of the version
336
+ of the License under which You originally received the Covered Software,
337
+ or under the terms of any subsequent version published by the license
338
+ steward.
339
+
340
+ 10.3. Modified Versions
341
+
342
+ If you create software not governed by this License, and you want to
343
+ create a new license for such software, you may create and use a
344
+ modified version of this License if you rename the license and remove
345
+ any references to the name of the license steward (except to note that
346
+ such modified license differs from this License).
347
+
348
+ 10.4. Distributing Source Code Form that is Incompatible With Secondary
349
+ Licenses
350
+
351
+ If You choose to distribute Source Code Form that is Incompatible With
352
+ Secondary Licenses under the terms of this version of the License, the
353
+ notice described in Exhibit B of this License must be attached.
354
+
355
+ Exhibit A - Source Code Form License Notice
356
+ -------------------------------------------
357
+
358
+ This Source Code Form is subject to the terms of the Mozilla Public
359
+ License, v. 2.0. If a copy of the MPL was not distributed with this
360
+ file, You can obtain one at http://mozilla.org/MPL/2.0/.
361
+
362
+ If it is not possible or desirable to put the notice in a particular
363
+ file, then You may include the notice in a location (such as a LICENSE
364
+ file in a relevant directory) where a recipient would be likely to look
365
+ for such a notice.
366
+
367
+ You may add additional accurate notices of copyright ownership.
368
+
369
+ Exhibit B - "Incompatible With Secondary Licenses" Notice
370
+ ---------------------------------------------------------
371
+
372
+ This Source Code Form is "Incompatible With Secondary Licenses", as
373
+ defined by the Mozilla Public License, v. 2.0.
README.md CHANGED
@@ -1,10 +1,341 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: Visual Product Matcher
3
- emoji: 🐠
4
- colorFrom: pink
5
- colorTo: purple
6
- sdk: docker
7
- pinned: false
8
- ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
1
+ # Visual Product Search 🔍
2
+
3
+ An intelligent visual search engine that revolutionizes product discovery using state-of-the-art AI technology. This application combines CLIP (Contrastive Language-Image Pre-Training) with Qdrant vector database to enable semantic search across image collections, making it perfect for e-commerce, inventory management, and content discovery.
4
+
5
+ ## 🌟 Key Features
6
+
7
+ - 🎯 **Multi-Modal Search**: Search using text descriptions, uploaded images, or image URLs
8
+ - 🖼️ **Smart Indexing**: Automatically indexes and monitors image folders with real-time updates
9
+ - 🔍 **Semantic Understanding**: Uses OpenAI's CLIP model for deep image-text comprehension
10
+ - � **Similarity Scoring**: Provides percentage-based similarity scores for accurate results
11
+ - ⚡ **Real-time Processing**: WebSocket-powered live progress updates during indexing
12
+ - 🎨 **Modern UI**: Clean, responsive interface with advanced search capabilities
13
+ - 🌐 **URL Support**: Direct image search from web URLs
14
+ - 📱 **Mobile Responsive**: Works seamlessly across all devices
15
+
16
+ ## 🧠 Technical Approach & Solution
17
+
18
+ ### Problem Statement
19
+ Traditional image search relies on metadata and filenames, which often fail to capture the actual visual content. Users struggle to find specific products or images without knowing exact file names or having perfect tagging systems.
20
+
21
+ ### Our Solution Architecture
22
+
23
+ #### 1. **Multi-Modal Embedding Generation**
24
+ ```
25
+ Text Query → CLIP Text Encoder → 512D Vector
26
+ Image Input → CLIP Vision Encoder → 512D Vector
27
+ URL Image → Download → CLIP Vision Encoder → 512D Vector
28
+ ```
29
+
30
+ #### 2. **Vector Similarity Search**
31
+ - **Database**: Qdrant cloud vector database for scalable similarity search
32
+ - **Indexing**: Real-time folder monitoring with automatic embedding generation
33
+ - **Storage**: Hybrid approach - embeddings in Qdrant, metadata in SQLite
34
+
35
+ #### 3. **Semantic Matching Pipeline**
36
+ ```
37
+ User Input → Feature Extraction → Vector Search → Similarity Ranking → Results
38
+ ```
39
+
40
+ ### �️ Architecture Components
41
+
42
+ #### Backend (FastAPI)
43
+ - **Image Processing**: PIL + CLIP for feature extraction
44
+ - **Vector Operations**: Qdrant client for similarity search
45
+ - **File Management**: Automatic folder monitoring and indexing
46
+ - **API Endpoints**: RESTful APIs for all search operations
47
+
48
+ #### Frontend (Modern Web UI)
49
+ - **Framework**: Vanilla JavaScript with Bootstrap 5
50
+ - **Styling**: Custom CSS with modern design principles
51
+ - **Real-time Updates**: WebSocket connections for live progress
52
+ - **Responsive Design**: Mobile-first approach
53
+
54
+ #### Database Layer
55
+ - **Vector Storage**: Qdrant cloud for embeddings and similarity search
56
+ - **Metadata Storage**: SQLite for image metadata and file information
57
+ - **Caching**: Thumbnail generation and caching for performance
58
+
59
+ ## 🚀 Quick Start
60
+
61
+ ### Prerequisites
62
+ - Python 3.8+
63
+ - CUDA-compatible GPU (optional, recommended for performance)
64
+ - Qdrant Cloud account (free tier available)
65
+
66
+ ### Installation
67
+
68
+ 1. **Clone the repository**:
69
+ ```bash
70
+ git clone https://github.com/itsfuad/SnapSeek
71
+ cd SnapSeek
72
+ ```
73
+
74
+ 2. **Create virtual environment**:
75
+ ```bash
76
+ python -m venv venv
77
+ source venv/bin/activate # Windows: venv\Scripts\activate
78
+ ```
79
+
80
+ 3. **Install dependencies**:
81
+ ```bash
82
+ pip install -r requirements.txt
83
+ ```
84
+
85
+ 4. **Configure environment**:
86
+ Create a `.env` file:
87
+ ```env
88
+ QDRANT_API_KEY=your_qdrant_api_key
89
+ QDRANT_URL=your_qdrant_cluster_url
90
+ ```
91
+
92
+ 5. **Launch the application**:
93
+ ```bash
94
+ python app.py
95
+ ```
96
+
97
+ 6. **Access the interface**:
98
+ Open http://localhost:8000 in your browser
99
+
100
+ ## 🎯 Usage Guide
101
+
102
+ ### 1. **Index Your Images**
103
+ - Click "Add Folder" to select image directories
104
+ - Watch real-time indexing progress
105
+ - Images are automatically monitored for changes
106
+
107
+ ### 2. **Search Methods**
108
+
109
+ #### Text Search
110
+ ```
111
+ "red sports car"
112
+ "woman wearing blue dress"
113
+ "modern kitchen design"
114
+ ```
115
+
116
+ #### Image Upload Search
117
+ - Click the image icon
118
+ - Upload a reference image
119
+ - Get visually similar results
120
+
121
+ #### URL Search
122
+ - Click the link icon
123
+ - Paste any image URL
124
+ - Find similar images in your collection
125
+
126
+ ### 3. **Results & Insights**
127
+ - Similarity percentages for each match
128
+ - High-resolution image previews
129
+ - Metadata and file information
130
+
131
+ ## 🏭 Production Deployment
132
+
133
+ ### Recommended Platforms
134
+
135
+ #### 1. **Railway (Recommended)**
136
+ - **Why**: Best for AI/ML applications with generous free tier
137
+ - **Resources**: 512MB RAM, 1GB storage
138
+ - **Benefits**: No sleep mode, automatic GitHub deployments
139
+
140
+ ```dockerfile
141
+ # Dockerfile
142
+ FROM python:3.9-slim
143
+
144
+ WORKDIR /app
145
+ COPY requirements.txt .
146
+ RUN pip install --no-cache-dir -r requirements.txt
147
+ COPY . .
148
+ EXPOSE 8000
149
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
150
+ ```
151
+
152
+ #### 2. **Render**
153
+ - **Resources**: 512MB RAM, 1GB storage
154
+ - **Benefits**: Free SSL, auto-deploy, no cold starts
155
+
156
+ #### 3. **Fly.io**
157
+ - **Resources**: 256MB RAM, 3GB storage volume
158
+ - **Benefits**: Global edge deployment, persistent volumes
159
+
160
+ ### Environment Variables for Production
161
+ ```env
162
+ QDRANT_API_KEY=your_production_key
163
+ QDRANT_URL=your_production_cluster
164
+ PORT=8000
165
+ DATA_DIR=/app/data
166
+ ```
167
+
168
+ ## 🛠️ Development & Testing
169
+
170
+ ### Project Structure
171
+ ```
172
+ SnapSeek/
173
+ ├── app.py # FastAPI application
174
+ ├── image_indexer.py # Image processing and indexing
175
+ ├── image_search.py # Search logic and CLIP integration
176
+ ├── image_database.py # Database operations
177
+ ├── folder_manager.py # Folder monitoring and management
178
+ ├── qdrant_singleton.py # Qdrant client management
179
+ ├── requirements.txt # Dependencies
180
+ ├── .env # Environment configuration
181
+ ├── templates/
182
+ │ └── index.html # Main UI template
183
+ ├── static/
184
+ │ ├── js/
185
+ │ │ └── script.js # Frontend JavaScript
186
+ │ └── image.png # Application icon
187
+ ├── config/
188
+ │ └── folders.json # Folder configuration
189
+ └── tests/
190
+ └── test_*.py # Test files
191
+ ```
192
+
193
+ ### Running Tests
194
+ ```bash
195
+ pip install -r requirements-test.txt
196
+ pytest tests/ -v
197
+ ```
198
+
199
+ ### Development Setup
200
+ ```bash
201
+ # Install development dependencies
202
+ pip install -r requirements-test.txt
203
+
204
+ # Run with auto-reload
205
+ uvicorn app:app --reload --host 0.0.0.0 --port 8000
206
+ ```
207
+
208
+ ## 🔧 Performance Optimization
209
+
210
+ ### Model Selection
211
+ ```python
212
+ # For production (smaller, faster)
213
+ MODEL_NAME = "openai/clip-vit-base-patch16"
214
+
215
+ # For development (balance)
216
+ MODEL_NAME = "openai/clip-vit-base-patch32"
217
+ ```
218
+
219
+ ### Hardware Recommendations
220
+ - **CPU**: 4+ cores for concurrent processing
221
+ - **RAM**: 8GB+ for model loading and image processing
222
+ - **Storage**: SSD recommended for faster I/O
223
+ - **GPU**: Optional, CUDA-compatible for faster inference
224
+
225
+ ### Scaling Considerations
226
+ - **Batch Processing**: Process multiple images simultaneously
227
+ - **Caching**: Implement Redis for frequent queries
228
+ - **Load Balancing**: Use multiple instances for high traffic
229
+ - **Database Sharding**: Split collections by categories
230
+
231
+ ## 🐛 Troubleshooting
232
+
233
+ ### Common Issues
234
+
235
+ #### 1. **Model Loading Errors**
236
+ ```bash
237
+ # Clear cache and reinstall
238
+ pip uninstall torch torchvision transformers
239
+ pip install torch torchvision transformers --no-cache-dir
240
+ ```
241
+
242
+ #### 2. **Qdrant Connection Issues**
243
+ - Verify API key and URL in `.env`
244
+ - Check network connectivity
245
+ - Ensure Qdrant cluster is active
246
+
247
+ #### 3. **Memory Issues**
248
+ - Reduce batch size in processing
249
+ - Use CPU-only mode: `device="cpu"`
250
+ - Close unused applications
251
+
252
+ #### 4. **Slow Performance**
253
+ - Enable GPU acceleration
254
+ - Optimize image sizes
255
+ - Implement result caching
256
+
257
+ ### Performance Monitoring
258
+ ```python
259
+ # Add logging for performance tracking
260
+ import time
261
+ import logging
262
+
263
+ logging.basicConfig(level=logging.INFO)
264
+ logger = logging.getLogger(__name__)
265
+
266
+ # Time search operations
267
+ start_time = time.time()
268
+ results = await searcher.search_by_text(query)
269
+ logger.info(f"Search completed in {time.time() - start_time:.2f}s")
270
+ ```
271
+
272
+ ## 🤝 Contributing
273
+
274
+ 1. Fork the repository
275
+ 2. Create a feature branch: `git checkout -b feature-name`
276
+ 3. Make your changes and add tests
277
+ 4. Run tests: `pytest tests/`
278
+ 5. Commit changes: `git commit -m "Add feature"`
279
+ 6. Push to branch: `git push origin feature-name`
280
+ 7. Create a Pull Request
281
+
282
+ ### Code Standards
283
+ - Follow PEP 8 style guidelines
284
+ - Add docstrings to all functions
285
+ - Include type hints where appropriate
286
+ - Write tests for new features
287
+
288
+ ## 📊 Use Cases & Applications
289
+
290
+ ### E-commerce
291
+ - Product recommendation systems
292
+ - Visual search for online stores
293
+ - Inventory management
294
+ - Duplicate product detection
295
+
296
+ ### Content Management
297
+ - Digital asset organization
298
+ - Stock photo searching
299
+ - Brand consistency checking
300
+ - Content moderation
301
+
302
+ ### Research & Education
303
+ - Academic image databases
304
+ - Scientific data analysis
305
+ - Historical archive searches
306
+ - Educational content discovery
307
+
308
+ ## 🔮 Future Enhancements
309
+
310
+ - [ ] **Multi-language Support**: Extend text search to multiple languages
311
+ - [ ] **Advanced Filters**: Add size, color, and metadata filters
312
+ - [ ] **Batch Operations**: Upload and search multiple images at once
313
+ - [ ] **API Integration**: RESTful API for external applications
314
+ - [ ] **Machine Learning**: Custom fine-tuned models for specific domains
315
+ - [ ] **Analytics Dashboard**: Search metrics and usage statistics
316
+ - [ ] **Mobile App**: Native mobile applications
317
+ - [ ] **Cloud Storage**: Integration with AWS S3, Google Drive, etc.
318
+
319
+ ## 📄 License
320
+
321
+ This project is licensed under the Mozilla Public License 2.0 - see the [LICENSE](LICENSE) file for details.
322
+
323
+ ## 🙏 Acknowledgments
324
+
325
+ - **OpenAI**: For the CLIP model and research
326
+ - **Qdrant**: For the excellent vector database
327
+ - **FastAPI**: For the modern web framework
328
+ - **Transformers**: For the model implementation
329
+ - **Bootstrap**: For the UI components
330
+
331
+ ## 📞 Support & Contact
332
+
333
+ - **Issues**: [GitHub Issues](https://github.com/itsfuad/SnapSeek/issues)
334
+ - **Discussions**: [GitHub Discussions](https://github.com/itsfuad/SnapSeek/discussions)
335
+ - **Documentation**: [Wiki](https://github.com/itsfuad/SnapSeek/wiki)
336
+
337
  ---
 
 
 
 
 
 
 
338
 
339
+ **Made with ❤️ by [itsfuad](https://github.com/itsfuad)**
340
+
341
+ *Revolutionizing visual search with AI technology*
app.py ADDED
@@ -0,0 +1,328 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from pathlib import Path
3
+ from typing import List, Optional
4
+ import io
5
+ from contextlib import asynccontextmanager
6
+ from fastapi import FastAPI, File, UploadFile, Request, WebSocket, WebSocketDisconnect, HTTPException, BackgroundTasks
7
+ from fastapi.responses import HTMLResponse, FileResponse, StreamingResponse
8
+ from fastapi.staticfiles import StaticFiles
9
+ from fastapi.templating import Jinja2Templates
10
+ from PIL import Image
11
+
12
+ from image_indexer import ImageIndexer
13
+ from image_search import ImageSearch
14
+ from image_database import ImageDatabase
15
+
16
+ # Initialize image indexer, searcher, and database
17
+ indexer = ImageIndexer()
18
+ searcher = ImageSearch()
19
+ image_db = ImageDatabase()
20
+
21
+ image_extensions = [".jpg", ".jpeg", ".png", ".gif"]
22
+
23
+ @asynccontextmanager
24
+ async def lifespan(_: FastAPI):
25
+ """Initialize the image indexer"""
26
+ yield
27
+
28
+ app = FastAPI(title="Visual Product Search", lifespan=lifespan)
29
+
30
+ # Setup templates and static files
31
+ templates = Jinja2Templates(directory="templates")
32
+ app.mount("/static", StaticFiles(directory="static"), name="static")
33
+
34
+ @app.get("/", response_class=HTMLResponse)
35
+ async def home(request: Request):
36
+ """Render the home page"""
37
+ folders = indexer.folder_manager.get_all_folders()
38
+ return templates.TemplateResponse(
39
+ "index.html",
40
+ {
41
+ "request": request,
42
+ "initial_status": {
43
+ "status": indexer.status.value,
44
+ "current_file": indexer.current_file,
45
+ "total_files": indexer.total_files,
46
+ "processed_files": indexer.processed_files,
47
+ "progress_percentage": round((indexer.processed_files / indexer.total_files * 100) if indexer.total_files > 0 else 0, 2)
48
+ },
49
+ "folders": folders
50
+ }
51
+ )
52
+
53
+ @app.post("/folders")
54
+ async def add_folder(folder_path: str, background_tasks: BackgroundTasks):
55
+ """Add a new folder to index"""
56
+ try:
57
+ # Add folder to manager first (this creates the collection)
58
+ folder_info = indexer.folder_manager.add_folder(folder_path)
59
+
60
+ # Start indexing in the background
61
+ background_tasks.add_task(indexer.index_folder, folder_path)
62
+
63
+ return folder_info
64
+ except Exception as e:
65
+ raise HTTPException(status_code=400, detail=str(e)) from e
66
+
67
+ @app.delete("/folders/{folder_path:path}")
68
+ async def remove_folder(folder_path: str):
69
+ """Remove a folder from indexing"""
70
+ try:
71
+ await indexer.remove_folder(folder_path)
72
+ return {"status": "success"}
73
+ except Exception as e:
74
+ raise HTTPException(status_code=400, detail=str(e)) from e
75
+
76
+ @app.get("/folders")
77
+ async def list_folders():
78
+ """List all indexed folders"""
79
+ return indexer.folder_manager.get_all_folders()
80
+
81
+ @app.get("/search/text")
82
+ async def search_by_text(query: str, folder: Optional[str] = None) -> List[dict]:
83
+ """Search images by text query, optionally filtered by folder"""
84
+ results = await searcher.search_by_text(query, folder)
85
+ return results
86
+
87
+ @app.post("/search/image")
88
+ async def search_by_image(
89
+ file: UploadFile = File(...),
90
+ folder: Optional[str] = None
91
+ ) -> List[dict]:
92
+ """Search images by uploading a similar image, optionally filtered by folder"""
93
+ contents = await file.read()
94
+ image = Image.open(io.BytesIO(contents))
95
+ results = await searcher.search_by_image(image, folder)
96
+ return results
97
+
98
+ @app.get("/search/url")
99
+ async def search_by_url(
100
+ url: str,
101
+ folder: Optional[str] = None
102
+ ) -> List[dict]:
103
+ """Search images by providing a URL to a similar image, optionally filtered by folder"""
104
+ results = await searcher.search_by_url(url, folder)
105
+ return results
106
+
107
+ @app.get("/images")
108
+ async def list_images(folder: Optional[str] = None) -> List[dict]:
109
+ """List all indexed images, optionally filtered by folder"""
110
+ return await indexer.get_all_images(folder)
111
+
112
+ @app.websocket("/ws")
113
+ async def websocket_endpoint(websocket: WebSocket):
114
+ """WebSocket endpoint for real-time indexing status updates"""
115
+ await indexer.add_websocket_connection(websocket)
116
+ try:
117
+ while True:
118
+ await websocket.receive_text()
119
+ except WebSocketDisconnect:
120
+ await indexer.remove_websocket_connection(websocket)
121
+
122
+ @app.get("/image/{image_id}")
123
+ async def serve_image(image_id: str):
124
+ """Serve an image from the database by ID"""
125
+ try:
126
+ image_data = image_db.get_image(image_id)
127
+ if not image_data:
128
+ raise HTTPException(status_code=404, detail="Image not found")
129
+
130
+ return StreamingResponse(
131
+ io.BytesIO(image_data["image_data"]),
132
+ media_type=f"image/{image_data['file_extension'].lstrip('.')}",
133
+ headers={
134
+ "Cache-Control": "max-age=86400", # Cache for 24 hours
135
+ "Content-Disposition": f"inline; filename=\"{image_data['filename']}\""
136
+ }
137
+ )
138
+ except Exception as e:
139
+ raise HTTPException(status_code=500, detail=str(e))
140
+
141
+ @app.get("/thumbnail/{image_id}")
142
+ async def serve_thumbnail_by_id(image_id: str):
143
+ """Serve a thumbnail from the database by ID"""
144
+ try:
145
+ thumbnail_data = image_db.get_thumbnail(image_id)
146
+ if not thumbnail_data:
147
+ raise HTTPException(status_code=404, detail="Thumbnail not found")
148
+
149
+ return StreamingResponse(
150
+ io.BytesIO(thumbnail_data),
151
+ media_type="image/jpeg",
152
+ headers={"Cache-Control": "max-age=86400"} # Cache for 24 hours
153
+ )
154
+ except Exception as e:
155
+ raise HTTPException(status_code=500, detail=str(e))
156
+
157
+ @app.get("/stats")
158
+ async def get_database_stats():
159
+ """Get database statistics"""
160
+ try:
161
+ return image_db.get_database_stats()
162
+ except Exception as e:
163
+ raise HTTPException(status_code=500, detail=str(e))
164
+
165
+ @app.get("/debug/collections")
166
+ async def debug_collections():
167
+ """Debug endpoint to check collections and folders"""
168
+ try:
169
+ # Get Qdrant client and collections
170
+ qdrant_client = indexer.qdrant
171
+ collections = qdrant_client.get_collections().collections
172
+
173
+ # Get folder manager status
174
+ folders = indexer.folder_manager.get_all_folders()
175
+
176
+ return {
177
+ "qdrant_collections": [col.name for col in collections],
178
+ "folder_manager_folders": folders,
179
+ "collections_count": len(collections),
180
+ "folders_count": len(folders)
181
+ }
182
+ except Exception as e:
183
+ return {"error": str(e)}
184
+
185
+ # Keep the old endpoints for backward compatibility but mark as deprecated
186
+ @app.get("/thumbnail/{folder_path:path}/{file_path:path}")
187
+ async def serve_thumbnail(folder_path: str, file_path: str):
188
+ """Serve resized image thumbnails (DEPRECATED - use /thumbnail/{image_id} instead)"""
189
+ try:
190
+ # Get folder info to verify it's an indexed folder
191
+ folder_info = indexer.folder_manager.get_folder_info(folder_path)
192
+ if not folder_info:
193
+ raise HTTPException(status_code=404, detail="Folder not found")
194
+
195
+ # Construct full file path
196
+ full_path = Path(folder_path) / file_path
197
+ if not full_path.exists():
198
+ raise HTTPException(status_code=404, detail="File not found")
199
+
200
+ # Only serve image files
201
+ if full_path.suffix.lower() not in image_extensions:
202
+ raise HTTPException(status_code=400, detail="Invalid file type")
203
+
204
+ # Open image, resize, and convert to JPEG
205
+ img = Image.open(full_path)
206
+ img.thumbnail((200, 200)) # Resize maintaining aspect ratio
207
+
208
+ # Save to a byte stream
209
+ img_byte_arr = io.BytesIO()
210
+ img.save(img_byte_arr, format="JPEG")
211
+ img_byte_arr.seek(0)
212
+
213
+ return StreamingResponse(img_byte_arr, media_type="image/jpeg", headers={"Cache-Control": "max-age=3600"}) # Cache for 1 hour
214
+ except Exception as e:
215
+ raise HTTPException(status_code=500, detail=str(e))
216
+
217
+ @app.get("/files/{folder_path:path}/{file_path:path}")
218
+ async def serve_file(folder_path: str, file_path: str):
219
+ """Serve files from indexed folders (DEPRECATED - use /image/{image_id} instead)"""
220
+ try:
221
+ # Get folder info to verify it's an indexed folder
222
+ folder_info = indexer.folder_manager.get_folder_info(folder_path)
223
+ if not folder_info:
224
+ raise HTTPException(status_code=404, detail="Folder not found")
225
+
226
+ # Construct full file path
227
+ full_path = Path(folder_path) / file_path
228
+ if not full_path.exists():
229
+ raise HTTPException(status_code=404, detail="File not found")
230
+
231
+ # Only serve image files
232
+ if full_path.suffix.lower() not in image_extensions:
233
+ raise HTTPException(status_code=400, detail="Invalid file type")
234
+
235
+ return FileResponse(full_path)
236
+ except Exception as e:
237
+ raise HTTPException(status_code=500, detail=str(e)) from e
238
+
239
+ def get_windows_drives():
240
+ """Get available drives on Windows"""
241
+ from ctypes import windll
242
+ drives = []
243
+ bitmask = windll.kernel32.GetLogicalDrives()
244
+ for letter in range(65, 91): # A-Z
245
+ if bitmask & (1 << (letter - 65)):
246
+ drives.append(chr(letter) + ":\\")
247
+ return drives
248
+
249
+ def get_directory_item(item):
250
+ """Get directory item info"""
251
+ try:
252
+ is_dir = item.is_dir()
253
+ if is_dir or item.suffix.lower() in image_extensions:
254
+ return {
255
+ "name": item.name,
256
+ "path": str(item.absolute()),
257
+ "type": "directory" if is_dir else "file",
258
+ "size": item.stat().st_size if not is_dir else None
259
+ }
260
+ except Exception:
261
+ pass
262
+ return None
263
+
264
+ def get_directory_contents(path: str):
265
+ """Get contents of a directory"""
266
+ try:
267
+ path_obj = Path(path)
268
+ if not path_obj.exists():
269
+ return {"error": "Path does not exist"}
270
+
271
+ parent = str(path_obj.parent) if path_obj.parent != path_obj else None
272
+ contents = [
273
+ item for item in (get_directory_item(i) for i in path_obj.iterdir())
274
+ if item is not None
275
+ ]
276
+
277
+ return {
278
+ "current_path": str(path_obj.absolute()),
279
+ "parent_path": parent,
280
+ "contents": sorted(contents, key=lambda x: (x["type"] != "directory", x["name"].lower()))
281
+ }
282
+ except Exception as e:
283
+ return {"error": str(e)}
284
+
285
+ @app.get("/browse")
286
+ async def browse_folders():
287
+ """Browse system folders"""
288
+ if os.name == "nt": # Windows
289
+ return {"drives": get_windows_drives()}
290
+ return get_directory_contents("/") # Unix-like
291
+
292
+ @app.get("/browse/{path:path}")
293
+ async def browse_path(path: str):
294
+ """Browse a specific path"""
295
+ try:
296
+ path_obj = Path(path)
297
+ if not path_obj.exists():
298
+ raise HTTPException(status_code=404, detail="Path not found")
299
+
300
+ # Get parent directory for navigation
301
+ parent = str(path_obj.parent) if path_obj.parent != path_obj else None
302
+
303
+ # List directories and files
304
+ contents = []
305
+ for item in path_obj.iterdir():
306
+ try:
307
+ is_dir = item.is_dir()
308
+ if is_dir or item.suffix.lower() in image_extensions:
309
+ contents.append({
310
+ "name": item.name,
311
+ "path": str(item.absolute()),
312
+ "type": "directory" if is_dir else "file",
313
+ "size": item.stat().st_size if not is_dir else None
314
+ })
315
+ except Exception:
316
+ continue # Skip items we can't access
317
+
318
+ return {
319
+ "current_path": str(path_obj.absolute()),
320
+ "parent_path": parent,
321
+ "contents": sorted(contents, key=lambda x: (x["type"] != "directory", x["name"].lower()))
322
+ }
323
+ except Exception as e:
324
+ raise HTTPException(status_code=500, detail=str(e)) from e
325
+
326
+ if __name__ == "__main__":
327
+ import uvicorn
328
+ uvicorn.run("app:app", host="0.0.0.0", port=8000, reload=False)
folder_manager.py ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pathlib import Path
2
+ from typing import List, Dict, Optional
3
+ import json
4
+ import time
5
+ from qdrant_singleton import QdrantClientSingleton
6
+
7
+ class FolderManager:
8
+ def __init__(self):
9
+ # Ensure config directory exists
10
+ self.config_dir = Path("config")
11
+ self.config_dir.mkdir(exist_ok=True)
12
+
13
+ # Ensure folders.json exists
14
+ self.config_file = self.config_dir / "folders.json"
15
+ if not self.config_file.exists():
16
+ self._create_default_config()
17
+
18
+ self.folders: Dict[str, Dict] = self._load_folders()
19
+
20
+ def _create_default_config(self):
21
+ """Create default configuration file if it doesn't exist"""
22
+ default_config = {}
23
+ with open(self.config_file, 'w') as f:
24
+ json.dump(default_config, f, indent=2)
25
+ print(f"Created default configuration file at {self.config_file}")
26
+
27
+ def _load_folders(self) -> Dict[str, Dict]:
28
+ """Load folder configurations from JSON file"""
29
+ if self.config_file.exists():
30
+ with open(self.config_file, 'r') as f:
31
+ return json.load(f)
32
+ return {}
33
+
34
+ def _save_folders(self):
35
+ """Save folder configurations to JSON file"""
36
+ # Ensure config directory exists before saving
37
+ self.config_dir.mkdir(exist_ok=True)
38
+
39
+ # Write config
40
+ with open(self.config_file, 'w') as f:
41
+ json.dump(self.folders, f, indent=2)
42
+
43
+ def add_folder(self, folder_path: str) -> Dict:
44
+ """Add a new folder to index"""
45
+ folder_path = str(Path(folder_path).absolute())
46
+ print(f"Adding folder: {folder_path}")
47
+
48
+ # Check if this folder or any parent/child is already being indexed
49
+ for existing_path in self.folders:
50
+ existing = Path(existing_path)
51
+ new_path = Path(folder_path)
52
+
53
+ # If the new path is already indexed
54
+ if existing == new_path:
55
+ print(f"Folder already indexed: {folder_path}")
56
+ return self.folders[existing_path]
57
+
58
+ # If the new path is a parent of an existing path, use the same collection
59
+ if existing.is_relative_to(new_path):
60
+ print(f"Using existing collection for parent path: {folder_path}")
61
+ return self.folders[existing_path]
62
+
63
+ # If the new path is a child of an existing path, use the parent's collection
64
+ if new_path.is_relative_to(existing):
65
+ print(f"Using parent's collection for: {folder_path}")
66
+ return self.folders[existing_path]
67
+
68
+ # If it's a completely new path, create a new entry
69
+ collection_name = f"images_{len(self.folders)}"
70
+ print(f"Creating new collection {collection_name} for folder: {folder_path}")
71
+
72
+ folder_info = {
73
+ "path": folder_path,
74
+ "collection_name": collection_name,
75
+ "added_at": int(time.time()),
76
+ "last_indexed": None
77
+ }
78
+
79
+ # Initialize new collection in Qdrant
80
+ try:
81
+ QdrantClientSingleton.initialize_collection(collection_name)
82
+ print(f"Successfully initialized collection: {collection_name}")
83
+ except Exception as e:
84
+ print(f"Error initializing collection {collection_name}: {e}")
85
+ raise e
86
+
87
+ # Save to config
88
+ self.folders[folder_path] = folder_info
89
+ self._save_folders()
90
+
91
+ print(f"Successfully added folder {folder_path} with collection {collection_name}")
92
+ return folder_info
93
+
94
+ def remove_folder(self, folder_path: str):
95
+ """Remove a folder from indexing"""
96
+ folder_path = str(Path(folder_path).absolute())
97
+ if folder_path in self.folders:
98
+ # Delete the collection
99
+ collection_name = self.folders[folder_path]["collection_name"]
100
+ client = QdrantClientSingleton.get_instance()
101
+ try:
102
+ client.delete_collection(collection_name=collection_name)
103
+ except Exception as e:
104
+ print(f"Error deleting collection: {e}")
105
+
106
+ # Remove from config
107
+ del self.folders[folder_path]
108
+ self._save_folders()
109
+
110
+ def get_folder_info(self, folder_path: str) -> Optional[Dict]:
111
+ """Get information about an indexed folder"""
112
+ folder_path = str(Path(folder_path).absolute())
113
+ return self.folders.get(folder_path)
114
+
115
+ def get_all_folders(self) -> List[Dict]:
116
+ """Get all indexed folders"""
117
+ return [
118
+ {
119
+ "path": path,
120
+ **info,
121
+ "is_valid": Path(path).exists() # Check if folder still exists
122
+ }
123
+ for path, info in self.folders.items()
124
+ ]
125
+
126
+ def update_last_indexed(self, folder_path: str):
127
+ """Update the last indexed timestamp for a folder"""
128
+ folder_path = str(Path(folder_path).absolute())
129
+ if folder_path in self.folders:
130
+ self.folders[folder_path]["last_indexed"] = int(time.time())
131
+ self._save_folders()
132
+
133
+ def get_collection_for_path(self, folder_path: str) -> Optional[str]:
134
+ """Get the collection name for a given path"""
135
+ folder_path = Path(folder_path).absolute()
136
+ print(f"Looking for collection for path: {folder_path}")
137
+
138
+ # Check each indexed folder to find the appropriate collection
139
+ for path, info in self.folders.items():
140
+ if folder_path == Path(path) or folder_path.is_relative_to(Path(path)):
141
+ print(f"Found collection {info['collection_name']} for path {folder_path}")
142
+ return info["collection_name"]
143
+
144
+ print(f"No collection found for path {folder_path}")
145
+ return None
image_database.py ADDED
@@ -0,0 +1,303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sqlite3
2
+ import base64
3
+ import uuid
4
+ from pathlib import Path
5
+ from typing import Optional, List, Dict, Tuple
6
+ from PIL import Image
7
+ import io
8
+ import hashlib
9
+
10
+
11
+ class ImageDatabase:
12
+ """SQLite database for storing images and metadata"""
13
+
14
+ def __init__(self, db_path: str = "images.db"):
15
+ self.db_path = db_path
16
+ self.init_database()
17
+
18
+ def init_database(self):
19
+ """Initialize the database with required tables"""
20
+ conn = sqlite3.connect(self.db_path)
21
+ cursor = conn.cursor()
22
+
23
+ # Create images table
24
+ cursor.execute('''
25
+ CREATE TABLE IF NOT EXISTS images (
26
+ id TEXT PRIMARY KEY,
27
+ file_hash TEXT UNIQUE NOT NULL,
28
+ original_path TEXT NOT NULL,
29
+ filename TEXT NOT NULL,
30
+ file_extension TEXT NOT NULL,
31
+ file_size INTEGER NOT NULL,
32
+ width INTEGER NOT NULL,
33
+ height INTEGER NOT NULL,
34
+ image_data BLOB NOT NULL,
35
+ thumbnail_data BLOB,
36
+ root_folder TEXT NOT NULL,
37
+ relative_path TEXT NOT NULL,
38
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
39
+ updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
40
+ )
41
+ ''')
42
+
43
+ # Create indexes for better performance
44
+ cursor.execute('CREATE INDEX IF NOT EXISTS idx_file_hash ON images(file_hash)')
45
+ cursor.execute('CREATE INDEX IF NOT EXISTS idx_root_folder ON images(root_folder)')
46
+ cursor.execute('CREATE INDEX IF NOT EXISTS idx_relative_path ON images(relative_path)')
47
+ cursor.execute('CREATE INDEX IF NOT EXISTS idx_filename ON images(filename)')
48
+
49
+ conn.commit()
50
+ conn.close()
51
+
52
+ def _calculate_file_hash(self, image_data: bytes) -> str:
53
+ """Calculate SHA-256 hash of image data"""
54
+ return hashlib.sha256(image_data).hexdigest()
55
+
56
+ def _create_thumbnail(self, image: Image.Image, size: Tuple[int, int] = (200, 200)) -> bytes:
57
+ """Create a thumbnail of the image"""
58
+ # Create a copy to avoid modifying original
59
+ thumbnail = image.copy()
60
+ thumbnail.thumbnail(size, Image.Resampling.LANCZOS)
61
+
62
+ # Convert to bytes
63
+ img_byte_arr = io.BytesIO()
64
+ # Save as JPEG for thumbnails to reduce size
65
+ if thumbnail.mode in ('RGBA', 'LA', 'P'):
66
+ thumbnail = thumbnail.convert('RGB')
67
+ thumbnail.save(img_byte_arr, format='JPEG', quality=85, optimize=True)
68
+ return img_byte_arr.getvalue()
69
+
70
+ def store_image(self, image_path: Path, root_folder: Path) -> Optional[str]:
71
+ """
72
+ Store an image in the database
73
+ Returns the image ID if successful, None if failed
74
+ """
75
+ try:
76
+ # Load the image
77
+ with Image.open(image_path) as image:
78
+ # Convert to RGB if needed
79
+ if image.mode in ('RGBA', 'LA', 'P'):
80
+ image = image.convert('RGB')
81
+
82
+ # Get image data as bytes
83
+ img_byte_arr = io.BytesIO()
84
+ image.save(img_byte_arr, format='JPEG', quality=95, optimize=True)
85
+ image_data = img_byte_arr.getvalue()
86
+
87
+ # Calculate file hash
88
+ file_hash = self._calculate_file_hash(image_data)
89
+
90
+ # Create thumbnail
91
+ thumbnail_data = self._create_thumbnail(image)
92
+
93
+ # Calculate relative path
94
+ relative_path = str(image_path.relative_to(root_folder))
95
+
96
+ # Prepare metadata
97
+ image_id = str(uuid.uuid4())
98
+ filename = image_path.name
99
+ file_extension = image_path.suffix.lower()
100
+ file_size = len(image_data)
101
+ width, height = image.size
102
+
103
+ conn = sqlite3.connect(self.db_path)
104
+ cursor = conn.cursor()
105
+
106
+ # Check if image already exists (by hash)
107
+ cursor.execute('SELECT id FROM images WHERE file_hash = ?', (file_hash,))
108
+ existing = cursor.fetchone()
109
+
110
+ if existing:
111
+ print(f"Image already exists in database: {filename}")
112
+ conn.close()
113
+ return existing[0]
114
+
115
+ # Insert new image
116
+ cursor.execute('''
117
+ INSERT INTO images (
118
+ id, file_hash, original_path, filename, file_extension,
119
+ file_size, width, height, image_data, thumbnail_data,
120
+ root_folder, relative_path
121
+ ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
122
+ ''', (
123
+ image_id, file_hash, str(image_path.absolute()), filename,
124
+ file_extension, file_size, width, height, image_data,
125
+ thumbnail_data, str(root_folder.absolute()), relative_path
126
+ ))
127
+
128
+ conn.commit()
129
+ conn.close()
130
+
131
+ print(f"Stored image in database: {filename} (ID: {image_id})")
132
+ return image_id
133
+
134
+ except Exception as e:
135
+ print(f"Error storing image {image_path}: {e}")
136
+ return None
137
+
138
+ def get_image(self, image_id: str) -> Optional[Dict]:
139
+ """Get an image by ID"""
140
+ conn = sqlite3.connect(self.db_path)
141
+ cursor = conn.cursor()
142
+
143
+ cursor.execute('''
144
+ SELECT id, filename, file_extension, file_size, width, height,
145
+ image_data, root_folder, relative_path, created_at
146
+ FROM images WHERE id = ?
147
+ ''', (image_id,))
148
+
149
+ result = cursor.fetchone()
150
+ conn.close()
151
+
152
+ if result:
153
+ return {
154
+ 'id': result[0],
155
+ 'filename': result[1],
156
+ 'file_extension': result[2],
157
+ 'file_size': result[3],
158
+ 'width': result[4],
159
+ 'height': result[5],
160
+ 'image_data': result[6],
161
+ 'root_folder': result[7],
162
+ 'relative_path': result[8],
163
+ 'created_at': result[9]
164
+ }
165
+ return None
166
+
167
+ def get_thumbnail(self, image_id: str) -> Optional[bytes]:
168
+ """Get thumbnail data for an image"""
169
+ conn = sqlite3.connect(self.db_path)
170
+ cursor = conn.cursor()
171
+
172
+ cursor.execute('SELECT thumbnail_data FROM images WHERE id = ?', (image_id,))
173
+ result = cursor.fetchone()
174
+ conn.close()
175
+
176
+ return result[0] if result else None
177
+
178
+ def get_images_by_folder(self, root_folder: str) -> List[Dict]:
179
+ """Get all images from a specific folder"""
180
+ conn = sqlite3.connect(self.db_path)
181
+ cursor = conn.cursor()
182
+
183
+ cursor.execute('''
184
+ SELECT id, filename, file_extension, file_size, width, height,
185
+ root_folder, relative_path, created_at
186
+ FROM images WHERE root_folder = ?
187
+ ORDER BY created_at DESC
188
+ ''', (root_folder,))
189
+
190
+ results = cursor.fetchall()
191
+ conn.close()
192
+
193
+ return [
194
+ {
195
+ 'id': row[0],
196
+ 'filename': row[1],
197
+ 'file_extension': row[2],
198
+ 'file_size': row[3],
199
+ 'width': row[4],
200
+ 'height': row[5],
201
+ 'root_folder': row[6],
202
+ 'relative_path': row[7],
203
+ 'created_at': row[8]
204
+ }
205
+ for row in results
206
+ ]
207
+
208
+ def get_all_images(self) -> List[Dict]:
209
+ """Get all images from the database"""
210
+ conn = sqlite3.connect(self.db_path)
211
+ cursor = conn.cursor()
212
+
213
+ cursor.execute('''
214
+ SELECT id, filename, file_extension, file_size, width, height,
215
+ root_folder, relative_path, created_at
216
+ FROM images
217
+ ORDER BY created_at DESC
218
+ ''')
219
+
220
+ results = cursor.fetchall()
221
+ conn.close()
222
+
223
+ return [
224
+ {
225
+ 'id': row[0],
226
+ 'filename': row[1],
227
+ 'file_extension': row[2],
228
+ 'file_size': row[3],
229
+ 'width': row[4],
230
+ 'height': row[5],
231
+ 'root_folder': row[6],
232
+ 'relative_path': row[7],
233
+ 'created_at': row[8]
234
+ }
235
+ for row in results
236
+ ]
237
+
238
+ def delete_image(self, image_id: str) -> bool:
239
+ """Delete an image from the database"""
240
+ conn = sqlite3.connect(self.db_path)
241
+ cursor = conn.cursor()
242
+
243
+ cursor.execute('DELETE FROM images WHERE id = ?', (image_id,))
244
+ deleted = cursor.rowcount > 0
245
+
246
+ conn.commit()
247
+ conn.close()
248
+
249
+ return deleted
250
+
251
+ def delete_images_by_folder(self, root_folder: str) -> int:
252
+ """Delete all images from a specific folder"""
253
+ conn = sqlite3.connect(self.db_path)
254
+ cursor = conn.cursor()
255
+
256
+ cursor.execute('DELETE FROM images WHERE root_folder = ?', (root_folder,))
257
+ deleted_count = cursor.rowcount
258
+
259
+ conn.commit()
260
+ conn.close()
261
+
262
+ return deleted_count
263
+
264
+ def image_exists_by_path(self, relative_path: str, root_folder: str) -> Optional[str]:
265
+ """Check if an image exists by its path, return image ID if exists"""
266
+ conn = sqlite3.connect(self.db_path)
267
+ cursor = conn.cursor()
268
+
269
+ cursor.execute('''
270
+ SELECT id FROM images
271
+ WHERE relative_path = ? AND root_folder = ?
272
+ ''', (relative_path, root_folder))
273
+
274
+ result = cursor.fetchone()
275
+ conn.close()
276
+
277
+ return result[0] if result else None
278
+
279
+ def get_database_stats(self) -> Dict:
280
+ """Get database statistics"""
281
+ conn = sqlite3.connect(self.db_path)
282
+ cursor = conn.cursor()
283
+
284
+ # Total images
285
+ cursor.execute('SELECT COUNT(*) FROM images')
286
+ total_images = cursor.fetchone()[0]
287
+
288
+ # Total size
289
+ cursor.execute('SELECT SUM(file_size) FROM images')
290
+ total_size = cursor.fetchone()[0] or 0
291
+
292
+ # Images by folder
293
+ cursor.execute('SELECT root_folder, COUNT(*) FROM images GROUP BY root_folder')
294
+ folders = cursor.fetchall()
295
+
296
+ conn.close()
297
+
298
+ return {
299
+ 'total_images': total_images,
300
+ 'total_size_bytes': total_size,
301
+ 'total_size_mb': round(total_size / (1024 * 1024), 2),
302
+ 'folders': {folder: count for folder, count in folders}
303
+ }
image_indexer.py ADDED
@@ -0,0 +1,384 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pathlib import Path
2
+ from typing import List, Dict, Set, Optional
3
+ import torch
4
+ from PIL import Image
5
+ import numpy as np
6
+ from transformers import CLIPProcessor, CLIPModel
7
+ from watchdog.observers import Observer
8
+ from watchdog.events import FileSystemEventHandler
9
+ import asyncio
10
+ from concurrent.futures import ThreadPoolExecutor
11
+ import threading
12
+ from qdrant_client.http.models import PointStruct
13
+ import uuid
14
+ from qdrant_singleton import QdrantClientSingleton, CURRENT_SCHEMA_VERSION
15
+ from fastapi import WebSocket
16
+ from enum import Enum
17
+ import qdrant_client
18
+ import time
19
+ from folder_manager import FolderManager
20
+ from image_database import ImageDatabase
21
+
22
+ class IndexingStatus(Enum):
23
+ IDLE = "idle"
24
+ INDEXING = "indexing"
25
+ MONITORING = "monitoring"
26
+
27
+ class ImageIndexer:
28
+ def __init__(self):
29
+ # Initialize folder manager and image database
30
+ self.folder_manager = FolderManager()
31
+ self.image_db = ImageDatabase()
32
+
33
+ # Initialize status tracking
34
+ self.status = IndexingStatus.IDLE
35
+ self.current_file: Optional[str] = None
36
+ self.total_files = 0
37
+ self.processed_files = 0
38
+ self.websocket_connections: Set[WebSocket] = set()
39
+
40
+ # Thread synchronization
41
+ self.collection_initialized = threading.Event()
42
+ self.model_initialized = threading.Event()
43
+
44
+ # Initialize Qdrant client
45
+ self.qdrant = QdrantClientSingleton.get_instance()
46
+
47
+ # Thread pool for background processing
48
+ self.executor = ThreadPoolExecutor(max_workers=4)
49
+
50
+ # Cache of indexed paths per collection
51
+ self.indexed_paths: Dict[str, Set[str]] = {}
52
+
53
+ # Model initialization flags
54
+ self.model = None
55
+ self.processor = None
56
+ self.device = None
57
+
58
+ # Start model initialization in a separate thread
59
+ threading.Thread(target=self._initialize_model_thread, daemon=True).start()
60
+
61
+ def _load_indexed_paths(self, collection_name: str):
62
+ """Load the set of already indexed paths from a collection"""
63
+ try:
64
+ response = self.qdrant.scroll(
65
+ collection_name=collection_name,
66
+ limit=10000,
67
+ with_payload=True,
68
+ with_vectors=False
69
+ )
70
+ self.indexed_paths[collection_name] = {point.payload["path"] for point in response[0]}
71
+ except Exception as e:
72
+ print(f"Error loading indexed paths for collection {collection_name}: {e}")
73
+ self.indexed_paths[collection_name] = set()
74
+
75
+ async def broadcast_status(self):
76
+ """Broadcast current status to all connected WebSocket clients"""
77
+ status_data = {
78
+ "status": self.status.value,
79
+ "current_file": self.current_file,
80
+ "total_files": self.total_files,
81
+ "processed_files": self.processed_files,
82
+ "progress_percentage": round((self.processed_files / self.total_files * 100) if self.total_files > 0 else 0, 2)
83
+ }
84
+
85
+ for connection in self.websocket_connections:
86
+ try:
87
+ await connection.send_json(status_data)
88
+ except Exception as e:
89
+ print(f"Error broadcasting to WebSocket: {e}")
90
+ self.websocket_connections.remove(connection)
91
+
92
+ async def add_websocket_connection(self, websocket: WebSocket):
93
+ """Add a new WebSocket connection"""
94
+ await websocket.accept()
95
+ self.websocket_connections.add(websocket)
96
+ await self.broadcast_status()
97
+
98
+ async def remove_websocket_connection(self, websocket: WebSocket):
99
+ """Remove a WebSocket connection"""
100
+ self.websocket_connections.remove(websocket)
101
+
102
+ async def add_folder(self, folder_path: str) -> Dict:
103
+ """Add a new folder to index"""
104
+ folder_info = self.folder_manager.add_folder(folder_path)
105
+ # Start indexing the new folder
106
+ await self.index_folder(folder_path)
107
+ return folder_info
108
+
109
+ async def remove_folder(self, folder_path: str):
110
+ """Remove a folder from indexing"""
111
+ # First remove from the folder manager
112
+ self.folder_manager.remove_folder(folder_path)
113
+
114
+ # Clean up SQLite database
115
+ folder_abs_path = str(Path(folder_path).absolute())
116
+ deleted_count = self.image_db.delete_images_by_folder(folder_abs_path)
117
+ print(f"Deleted {deleted_count} images from database for folder: {folder_path}")
118
+
119
+ async def index_folder(self, folder_path: str):
120
+ """Index all images in a specific folder"""
121
+ if not self.model_initialized.is_set() or not self.model or not self.processor:
122
+ print("Model not initialized. Skipping indexing.")
123
+ self.status = IndexingStatus.IDLE
124
+ await self.broadcast_status()
125
+ return
126
+
127
+ folder_path = Path(folder_path)
128
+ if not folder_path.exists():
129
+ print(f"Folder not found: {folder_path}")
130
+ return
131
+
132
+ collection_name = self.folder_manager.get_collection_for_path(folder_path)
133
+ if not collection_name:
134
+ print(f"No collection found for folder: {folder_path}")
135
+ return
136
+
137
+ # Wait for model initialization before starting indexing
138
+ while not self.model_initialized.is_set():
139
+ print("Waiting for model initialization...")
140
+ await asyncio.sleep(0.1)
141
+
142
+ print(f"Starting to index folder: {folder_path}")
143
+ self.status = IndexingStatus.INDEXING
144
+ self.processed_files = 0
145
+ self.current_file = None
146
+ await self.broadcast_status() # Broadcast initial status
147
+
148
+ # Load indexed paths for this collection if not already loaded
149
+ if collection_name not in self.indexed_paths:
150
+ self._load_indexed_paths(collection_name)
151
+
152
+ # Use rglob for recursive directory scanning
153
+ image_files = [f for f in folder_path.rglob("*") if f.suffix.lower() in {".jpg", ".jpeg", ".png", ".gif"}]
154
+ self.total_files = len(image_files)
155
+ print(f"Found {self.total_files} images to index")
156
+ await self.broadcast_status() # Broadcast after finding total files
157
+
158
+ try:
159
+ for i, image_file in enumerate(image_files, 1):
160
+ relative_path = str(image_file.relative_to(folder_path))
161
+ self.current_file = str(image_file)
162
+ self.processed_files = i - 1 # Update before processing
163
+ await self.broadcast_status() # Broadcast before processing each file
164
+
165
+ if relative_path not in self.indexed_paths[collection_name]:
166
+ print(f"Indexing image {i}/{self.total_files}: {image_file.name}")
167
+ await self.index_image(image_file, folder_path)
168
+ else:
169
+ print(f"Skipping already indexed image {i}/{self.total_files}: {image_file.name}")
170
+
171
+ self.processed_files = i # Update after processing
172
+ await self.broadcast_status() # Broadcast after processing each file
173
+
174
+ # Small delay to allow other tasks to run
175
+ await asyncio.sleep(0)
176
+
177
+ except Exception as e:
178
+ print(f"Error during indexing: {e}")
179
+ import traceback
180
+ traceback.print_exc()
181
+ finally:
182
+ # Update last indexed timestamp
183
+ self.folder_manager.update_last_indexed(str(folder_path))
184
+
185
+ # Reset status
186
+ self.status = IndexingStatus.MONITORING
187
+ self.current_file = None
188
+ await self.broadcast_status() # Final status broadcast
189
+ print("Finished indexing folder")
190
+
191
+ async def index_image(self, image_path: Path, root_folder: Path):
192
+ """Index a single image"""
193
+ if not self.model_initialized.is_set() or not self.model or not self.processor:
194
+ print("Model not initialized. Skipping indexing image.")
195
+ return
196
+
197
+ try:
198
+ # Wait for model initialization
199
+ while not self.model_initialized.is_set():
200
+ await asyncio.sleep(0.1)
201
+
202
+ # Get the collection for this path
203
+ collection_name = self.folder_manager.get_collection_for_path(str(root_folder))
204
+ if not collection_name:
205
+ print(f"No collection found for image: {image_path}")
206
+ return
207
+
208
+ # Convert to relative path from root folder
209
+ try:
210
+ relative_path = str(image_path.relative_to(root_folder))
211
+ except ValueError:
212
+ print(f"Image {image_path} is not under root folder {root_folder}")
213
+ return
214
+
215
+ print(f"Indexing image: {relative_path}")
216
+ self.current_file = str(image_path)
217
+ await self.broadcast_status()
218
+
219
+ # Check if image already exists in database
220
+ existing_image_id = self.image_db.image_exists_by_path(relative_path, str(root_folder.absolute()))
221
+ if existing_image_id:
222
+ # Check if it exists in Qdrant with current schema version
223
+ existing_points = self.qdrant.scroll(
224
+ collection_name=collection_name,
225
+ scroll_filter=qdrant_client.http.models.Filter(
226
+ must=[
227
+ qdrant_client.http.models.FieldCondition(
228
+ key="image_id",
229
+ match={"value": existing_image_id}
230
+ ),
231
+ qdrant_client.http.models.FieldCondition(
232
+ key="schema_version",
233
+ match={"value": CURRENT_SCHEMA_VERSION}
234
+ )
235
+ ]
236
+ ),
237
+ limit=1
238
+ )[0]
239
+
240
+ if existing_points:
241
+ print(f"Skipping {relative_path} - already indexed with current schema version")
242
+ return
243
+
244
+ # Store image in SQLite database first
245
+ image_id = self.image_db.store_image(image_path, root_folder)
246
+ if not image_id:
247
+ print(f"Failed to store image in database: {relative_path}")
248
+ return
249
+
250
+ # Load and preprocess image for embedding
251
+ image = Image.open(image_path).convert("RGB")
252
+ inputs = self.processor(images=image, return_tensors="pt").to(self.device)
253
+
254
+ # Generate image embedding
255
+ with torch.no_grad():
256
+ image_features = self.model.get_image_features(**inputs)
257
+ # Normalize the features
258
+ image_features = image_features / image_features.norm(dim=-1, keepdim=True)
259
+
260
+ embedding = image_features.cpu().numpy().flatten()
261
+
262
+ # Verify embedding is valid
263
+ if np.isnan(embedding).any() or np.isinf(embedding).any():
264
+ print(f"Warning: Invalid embedding generated for {relative_path}")
265
+ return
266
+
267
+ # Delete any old versions from Qdrant if they exist
268
+ self.qdrant.delete(
269
+ collection_name=collection_name,
270
+ points_selector=qdrant_client.http.models.FilterSelector(
271
+ filter=qdrant_client.http.models.Filter(
272
+ must=[
273
+ qdrant_client.http.models.FieldCondition(
274
+ key="path",
275
+ match={"value": relative_path}
276
+ )
277
+ ]
278
+ )
279
+ )
280
+ )
281
+
282
+ # Store in Qdrant with image ID reference and minimal metadata
283
+ point_id = str(uuid.uuid4())
284
+ self.qdrant.upsert(
285
+ collection_name=collection_name,
286
+ points=[
287
+ PointStruct(
288
+ id=point_id,
289
+ vector=embedding.tolist(),
290
+ payload={
291
+ "image_id": image_id, # Reference to SQLite database
292
+ "path": relative_path, # Relative path from root folder
293
+ "root_folder": str(root_folder.absolute()), # Store root folder path
294
+ "schema_version": CURRENT_SCHEMA_VERSION,
295
+ "indexed_at": int(time.time())
296
+ }
297
+ )
298
+ ]
299
+ )
300
+
301
+ # Update indexed paths cache
302
+ if collection_name not in self.indexed_paths:
303
+ self.indexed_paths[collection_name] = set()
304
+ self.indexed_paths[collection_name].add(relative_path)
305
+
306
+ print(f"Stored embedding in Qdrant for {relative_path} (Image ID: {image_id})")
307
+
308
+ except Exception as e:
309
+ print(f"Error indexing image {image_path}: {e}")
310
+ import traceback
311
+ traceback.print_exc()
312
+ finally:
313
+ # Don't reset current_file here as it's managed by index_folder
314
+ await self.broadcast_status()
315
+
316
+ def _initialize_model_thread(self):
317
+ """Initialize model in a separate thread"""
318
+ try:
319
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
320
+ print(f"Using device: {self.device}")
321
+
322
+ # Load model and processor with proper device handling
323
+ self.processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16")
324
+
325
+ # Load model directly to the target device to avoid meta tensor issues
326
+ if self.device == "cuda":
327
+ self.model = CLIPModel.from_pretrained(
328
+ "openai/clip-vit-base-patch16",
329
+ torch_dtype=torch.float16,
330
+ device_map="auto"
331
+ )
332
+ else:
333
+ # For CPU, use device_map to avoid meta tensor issues
334
+ self.model = CLIPModel.from_pretrained(
335
+ "openai/clip-vit-base-patch16",
336
+ device_map="cpu"
337
+ )
338
+
339
+ self.model_initialized.set()
340
+ print("Model initialization complete")
341
+ except Exception as e:
342
+ print(f"Error initializing model: {e}")
343
+ self.status = IndexingStatus.IDLE
344
+ asyncio.run(self.broadcast_status())
345
+
346
+ async def get_all_images(self, folder_path: Optional[str] = None) -> List[Dict]:
347
+ """Get all indexed images, optionally filtered by folder"""
348
+ try:
349
+ if folder_path:
350
+ # Get images from specific folder
351
+ results = self.image_db.get_images_by_folder(str(Path(folder_path).absolute()))
352
+ else:
353
+ # Get images from all folders
354
+ results = self.image_db.get_all_images()
355
+
356
+ # Convert to API format
357
+ api_results = []
358
+ for image_data in results:
359
+ api_results.append({
360
+ "id": image_data["id"],
361
+ "path": image_data["relative_path"],
362
+ "filename": image_data["filename"],
363
+ "root_folder": image_data["root_folder"],
364
+ "file_size": image_data["file_size"],
365
+ "width": image_data["width"],
366
+ "height": image_data["height"],
367
+ "created_at": image_data["created_at"]
368
+ })
369
+
370
+ return api_results
371
+
372
+ except Exception as e:
373
+ print(f"Error getting images: {e}")
374
+ import traceback
375
+ traceback.print_exc()
376
+ return []
377
+ class ImageEventHandler(FileSystemEventHandler):
378
+ def __init__(self, indexer: ImageIndexer, root_folder: Path):
379
+ self.indexer = indexer
380
+ self.root_folder = root_folder
381
+
382
+ def on_created(self, event):
383
+ if not event.is_directory:
384
+ asyncio.create_task(self.indexer.index_image(Path(event.src_path), self.root_folder))
image_search.py ADDED
@@ -0,0 +1,272 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from PIL import Image
3
+ from typing import List, Dict, Optional
4
+ from transformers import CLIPProcessor, CLIPModel
5
+ from qdrant_singleton import QdrantClientSingleton
6
+ from folder_manager import FolderManager
7
+ from image_database import ImageDatabase
8
+ import httpx
9
+ import io
10
+
11
+ class ImageSearch:
12
+ def __init__(self):
13
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
14
+ print(f"Using device: {self.device}")
15
+
16
+ # Load model and processor with proper device handling
17
+ self.processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16")
18
+
19
+ # Load model directly to the target device to avoid meta tensor issues
20
+ if self.device == "cuda":
21
+ self.model = CLIPModel.from_pretrained(
22
+ "openai/clip-vit-base-patch16",
23
+ torch_dtype=torch.float16,
24
+ device_map="auto"
25
+ )
26
+ else:
27
+ # For CPU, use device_map to avoid meta tensor issues
28
+ self.model = CLIPModel.from_pretrained(
29
+ "openai/clip-vit-base-patch16",
30
+ device_map="cpu"
31
+ )
32
+
33
+ # Initialize Qdrant client, folder manager and image database
34
+ self.qdrant = QdrantClientSingleton.get_instance()
35
+ self.folder_manager = FolderManager()
36
+ self.image_db = ImageDatabase()
37
+
38
+ def calculate_similarity_percentage(self, score: float) -> float:
39
+ """Convert cosine similarity score to percentage"""
40
+ # Qdrant returns cosine similarity scores between -1 and 1
41
+ # We want to convert this to a percentage between 0 and 100
42
+ # First normalize to 0-1 range, then convert to percentage
43
+ normalized = (score + 1) / 2
44
+ return normalized * 100
45
+
46
+ def filter_results(self, search_results: list, threshold: float = 60) -> List[Dict]:
47
+ """Filter and format search results"""
48
+ results = []
49
+ for scored_point in search_results:
50
+ # Convert cosine similarity to percentage
51
+ similarity = self.calculate_similarity_percentage(scored_point.score)
52
+
53
+ # Only include results above threshold (60% similarity)
54
+ if similarity >= threshold:
55
+ # Get image data from SQLite database
56
+ image_id = scored_point.payload.get("image_id")
57
+ if image_id:
58
+ image_data = self.image_db.get_image(image_id)
59
+ if image_data:
60
+ results.append({
61
+ "id": image_id,
62
+ "path": scored_point.payload["path"],
63
+ "filename": image_data["filename"],
64
+ "root_folder": scored_point.payload["root_folder"],
65
+ "similarity": round(similarity, 1),
66
+ "file_size": image_data["file_size"],
67
+ "width": image_data["width"],
68
+ "height": image_data["height"]
69
+ })
70
+
71
+ return results
72
+
73
+ async def search_by_text(self, query: str, folder_path: Optional[str] = None, k: int = 10) -> List[Dict]:
74
+ """Search images by text query"""
75
+ try:
76
+ print(f"\nSearching for text: '{query}'")
77
+
78
+ # Get collections to search
79
+ collections_to_search = []
80
+ if folder_path:
81
+ # Search in specific folder's collection
82
+ collection_name = self.folder_manager.get_collection_for_path(folder_path)
83
+ if collection_name:
84
+ collections_to_search.append(collection_name)
85
+ print(f"Searching in specific folder collection: {collection_name}")
86
+ else:
87
+ # Search in all collections
88
+ folders = self.folder_manager.get_all_folders()
89
+ print(f"Found {len(folders)} folders")
90
+ for folder in folders:
91
+ print(f"Folder: {folder['path']}, Valid: {folder['is_valid']}, Collection: {folder.get('collection_name', 'None')}")
92
+ # Include all collections regardless of folder validity since images are in SQLite
93
+ collections_to_search.extend(folder["collection_name"] for folder in folders if folder.get("collection_name"))
94
+
95
+ print(f"Collections to search: {collections_to_search}")
96
+
97
+ if not collections_to_search:
98
+ print("No collections available to search")
99
+ return []
100
+
101
+ # Generate text embedding
102
+ inputs = self.processor(text=[query], return_tensors="pt", padding=True).to(self.device)
103
+ with torch.no_grad():
104
+ text_features = self.model.get_text_features(**inputs)
105
+ text_features = text_features / text_features.norm(dim=-1, keepdim=True)
106
+ text_embedding = text_features.cpu().numpy().flatten()
107
+
108
+ # Search in all relevant collections
109
+ all_results = []
110
+ for collection_name in collections_to_search:
111
+ try:
112
+ # Get more results from each collection when searching multiple collections
113
+ collection_limit = k * 3 if len(collections_to_search) > 1 else k
114
+
115
+ search_result = self.qdrant.search(
116
+ collection_name=collection_name,
117
+ query_vector=text_embedding.tolist(),
118
+ limit=collection_limit, # Get more results from each collection
119
+ offset=0, # Explicitly set offset
120
+ score_threshold=0.2 # Corresponds to 60% similarity after normalization
121
+ )
122
+
123
+ # Filter and format results
124
+ results = self.filter_results(search_result) # Threshold is now default 60 in filter_results
125
+ all_results.extend(results)
126
+ print(f"Found {len(results)} matches in collection {collection_name}")
127
+ except Exception as e:
128
+ print(f"Error searching collection {collection_name}: {e}")
129
+ continue
130
+
131
+ # Sort all results by similarity
132
+ all_results.sort(key=lambda x: x["similarity"], reverse=True)
133
+
134
+ # Take top k results
135
+ final_results = all_results[:k]
136
+ print(f"Found {len(final_results)} total relevant matches across {len(collections_to_search)} collections")
137
+
138
+ return final_results
139
+
140
+ except Exception as e:
141
+ print(f"Error in text search: {e}")
142
+ import traceback
143
+ traceback.print_exc()
144
+ return []
145
+
146
+ async def search_by_image(self, image: Image.Image, folder_path: Optional[str] = None, k: int = 10) -> List[Dict]:
147
+ """Search images by similarity to uploaded image"""
148
+ try:
149
+ print(f"\nSearching by image...")
150
+
151
+ # Get collections to search
152
+ collections_to_search = []
153
+ if folder_path:
154
+ # Search in specific folder's collection
155
+ collection_name = self.folder_manager.get_collection_for_path(folder_path)
156
+ if collection_name:
157
+ collections_to_search.append(collection_name)
158
+ print(f"Searching in specific folder collection: {collection_name}")
159
+ else:
160
+ # Search in all collections
161
+ folders = self.folder_manager.get_all_folders()
162
+ print(f"Found {len(folders)} folders")
163
+ for folder in folders:
164
+ print(f"Folder: {folder['path']}, Valid: {folder['is_valid']}, Collection: {folder.get('collection_name', 'None')}")
165
+ # Include all collections regardless of folder validity since images are in SQLite
166
+ collections_to_search.extend(folder["collection_name"] for folder in folders if folder.get("collection_name"))
167
+
168
+ print(f"Collections to search: {collections_to_search}")
169
+
170
+ if not collections_to_search:
171
+ print("No collections available to search")
172
+ return []
173
+
174
+ # Generate image embedding
175
+ inputs = self.processor(images=image, return_tensors="pt").to(self.device)
176
+ with torch.no_grad():
177
+ image_features = self.model.get_image_features(**inputs)
178
+ image_features = image_features / image_features.norm(dim=-1, keepdim=True)
179
+ image_embedding = image_features.cpu().numpy().flatten()
180
+
181
+ # Search in all relevant collections
182
+ all_results = []
183
+ for collection_name in collections_to_search:
184
+ try:
185
+ # Get more results from each collection when searching multiple collections
186
+ collection_limit = k * 3 if len(collections_to_search) > 1 else k
187
+
188
+ search_result = self.qdrant.search(
189
+ collection_name=collection_name,
190
+ query_vector=image_embedding.tolist(),
191
+ limit=collection_limit, # Get more results from each collection
192
+ offset=0, # Explicitly set offset
193
+ score_threshold=0.2 # Corresponds to 60% similarity after normalization
194
+ )
195
+
196
+ # Filter and format results
197
+ results = self.filter_results(search_result) # Threshold is now default 60 in filter_results
198
+ all_results.extend(results)
199
+ print(f"Found {len(results)} matches in collection {collection_name}")
200
+ except Exception as e:
201
+ print(f"Error searching collection {collection_name}: {e}")
202
+ continue
203
+
204
+ # Sort all results by similarity
205
+ all_results.sort(key=lambda x: x["similarity"], reverse=True)
206
+
207
+ # Take top k results
208
+ final_results = all_results[:k]
209
+ print(f"Found {len(final_results)} total relevant matches across {len(collections_to_search)} collections")
210
+
211
+ return final_results
212
+
213
+ except Exception as e:
214
+ print(f"Error in image search: {e}")
215
+ import traceback
216
+ traceback.print_exc()
217
+ return []
218
+
219
+ async def download_image_from_url(self, url: str) -> Optional[Image.Image]:
220
+ """Download and return an image from a URL"""
221
+ try:
222
+ print(f"Downloading image from URL: {url}")
223
+
224
+ # Use httpx for async HTTP requests
225
+ async with httpx.AsyncClient(timeout=30.0) as client:
226
+ response = await client.get(url)
227
+ response.raise_for_status()
228
+
229
+ # Check if the response is an image
230
+ content_type = response.headers.get('content-type', '')
231
+ if not content_type.startswith('image/'):
232
+ raise ValueError(f"URL does not point to an image. Content-Type: {content_type}")
233
+
234
+ # Load image from response content
235
+ image_bytes = io.BytesIO(response.content)
236
+ image = Image.open(image_bytes)
237
+
238
+ # Convert to RGB if necessary (for consistency with CLIP)
239
+ if image.mode != 'RGB':
240
+ image = image.convert('RGB')
241
+
242
+ print(f"Successfully downloaded image: {image.size}")
243
+ return image
244
+
245
+ except httpx.TimeoutException:
246
+ print(f"Timeout while downloading image from URL: {url}")
247
+ return None
248
+ except httpx.HTTPStatusError as e:
249
+ print(f"HTTP error {e.response.status_code} while downloading image from URL: {url}")
250
+ return None
251
+ except Exception as e:
252
+ print(f"Error downloading image from URL {url}: {e}")
253
+ return None
254
+
255
+ async def search_by_url(self, url: str, folder_path: Optional[str] = None, k: int = 10) -> List[Dict]:
256
+ """Search images by downloading and comparing an image from a URL"""
257
+ try:
258
+ print(f"\nSearching by image URL: {url}")
259
+
260
+ # Download the image from URL
261
+ image = await self.download_image_from_url(url)
262
+ if image is None:
263
+ return []
264
+
265
+ # Use the existing search_by_image method
266
+ return await self.search_by_image(image, folder_path, k)
267
+
268
+ except Exception as e:
269
+ print(f"Error in URL search: {e}")
270
+ import traceback
271
+ traceback.print_exc()
272
+ return []
pyproject.toml ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ [tool.pytest.ini_options]
2
+ pythonpath = "."
3
+ testpaths = ["tests"]
4
+ python_files = ["test_*.py"]
5
+ asyncio_mode = "strict"
6
+ asyncio_default_fixture_loop_scope = "function"
qdrant_singleton.py ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from qdrant_client import QdrantClient
2
+ from qdrant_client.http import models
3
+ from pathlib import Path
4
+ import os
5
+ from dotenv import load_dotenv
6
+
7
+ # Load environment variables
8
+ load_dotenv()
9
+
10
+ CURRENT_SCHEMA_VERSION = "1.2" # Increment this when schema changes
11
+ VECTOR_SIZE = 512 # CLIP embedding size
12
+
13
+ class QdrantClientSingleton:
14
+ _instance = None
15
+
16
+ @classmethod
17
+ def get_instance(cls):
18
+ if cls._instance is None:
19
+ # Check if we have cloud credentials
20
+ qdrant_url = os.getenv('QDRANT_URL')
21
+ qdrant_api_key = os.getenv('QDRANT_API_KEY')
22
+
23
+ print(f"QDRANT_URL: {qdrant_url}")
24
+ print(f"QDRANT_API_KEY: {'***' + qdrant_api_key[-10:] if qdrant_api_key else 'None'}")
25
+
26
+ if qdrant_url and qdrant_api_key:
27
+ print(f"Initializing Qdrant Cloud client: {qdrant_url}")
28
+ try:
29
+ cls._instance = QdrantClient(
30
+ url=qdrant_url,
31
+ api_key=qdrant_api_key,
32
+ )
33
+ print("Successfully connected to Qdrant Cloud")
34
+ except Exception as e:
35
+ print(f"Failed to connect to Qdrant Cloud: {e}")
36
+ print("Falling back to local storage")
37
+ storage_path = Path("qdrant_data").absolute()
38
+ storage_path.mkdir(exist_ok=True)
39
+ cls._instance = QdrantClient(path=str(storage_path))
40
+ else:
41
+ # Fallback to local storage
42
+ print("Cloud credentials not found, using local Qdrant storage")
43
+ storage_path = Path("qdrant_data").absolute()
44
+ storage_path.mkdir(exist_ok=True)
45
+ cls._instance = QdrantClient(path=str(storage_path))
46
+
47
+ # Print collections for debugging
48
+ try:
49
+ collections = cls._instance.get_collections().collections
50
+ print(f"Available collections: {[col.name for col in collections]}")
51
+ except Exception as e:
52
+ print(f"Error getting collections: {e}")
53
+
54
+ return cls._instance
55
+
56
+ @classmethod
57
+ def initialize_collection(cls, collection_name: str):
58
+ client = cls.get_instance()
59
+
60
+ # Check if collection exists
61
+ collections = client.get_collections().collections
62
+ exists = any(collection.name == collection_name for collection in collections)
63
+
64
+ if not exists:
65
+ # Create new collection with current schema version
66
+ cls._create_collection(client, collection_name)
67
+ else:
68
+ # Check schema version and update if necessary
69
+ cls._check_and_update_schema(client, collection_name)
70
+
71
+ @classmethod
72
+ def _create_collection(cls, client: QdrantClient, collection_name: str):
73
+ """Create a new collection with the current schema version"""
74
+ # First create the collection with basic config
75
+ client.create_collection(
76
+ collection_name=collection_name,
77
+ vectors_config=models.VectorParams(
78
+ size=VECTOR_SIZE,
79
+ distance=models.Distance.COSINE
80
+ ),
81
+ on_disk_payload=True, # Store vectors on disk
82
+ optimizers_config=models.OptimizersConfigDiff(
83
+ indexing_threshold=0 # Index immediately
84
+ )
85
+ )
86
+
87
+ # Then create payload indexes for efficient searching
88
+ client.create_payload_index(
89
+ collection_name=collection_name,
90
+ field_name="image_id",
91
+ field_schema=models.PayloadSchemaType.KEYWORD
92
+ )
93
+
94
+ client.create_payload_index(
95
+ collection_name=collection_name,
96
+ field_name="path",
97
+ field_schema=models.PayloadSchemaType.KEYWORD
98
+ )
99
+
100
+ client.create_payload_index(
101
+ collection_name=collection_name,
102
+ field_name="root_folder",
103
+ field_schema=models.PayloadSchemaType.KEYWORD
104
+ )
105
+
106
+ client.create_payload_index(
107
+ collection_name=collection_name,
108
+ field_name="schema_version",
109
+ field_schema=models.PayloadSchemaType.KEYWORD
110
+ )
111
+
112
+ client.create_payload_index(
113
+ collection_name=collection_name,
114
+ field_name="indexed_at",
115
+ field_schema=models.PayloadSchemaType.INTEGER
116
+ )
117
+
118
+ print(f"Created collection {collection_name} with schema version {CURRENT_SCHEMA_VERSION}")
119
+
120
+ @classmethod
121
+ def _check_and_update_schema(cls, client: QdrantClient, collection_name: str):
122
+ """Check collection schema version and update if necessary"""
123
+ try:
124
+ # Get a sample point to check schema version
125
+ sample = client.scroll(
126
+ collection_name=collection_name,
127
+ limit=1,
128
+ with_payload=True
129
+ )[0]
130
+
131
+ if not sample:
132
+ print(f"Collection {collection_name} is empty")
133
+ return
134
+
135
+ # Check schema version of existing data
136
+ point_version = sample[0].payload.get("schema_version", "0.0")
137
+ if point_version != CURRENT_SCHEMA_VERSION:
138
+ print(f"Schema version mismatch: {point_version} != {CURRENT_SCHEMA_VERSION}")
139
+ print(f"Collection {collection_name} needs to be recreated")
140
+
141
+ # Recreate collection with new schema
142
+ client.delete_collection(collection_name=collection_name)
143
+ cls._create_collection(client, collection_name)
144
+ else:
145
+ print(f"Collection {collection_name} schema is up to date (version {CURRENT_SCHEMA_VERSION})")
146
+ except Exception as e:
147
+ print(f"Error checking schema: {e}")
148
+ cls._create_collection(client, collection_name)
requirements-test.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ pytest==7.4.4
2
+ pytest-asyncio==0.23.5
3
+ requests==2.31.0
requirements.txt ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ fastapi
2
+ uvicorn
3
+ torch
4
+ torchvision
5
+ transformers
6
+ Pillow
7
+ python-multipart
8
+ watchdog
9
+ numpy
10
+ qdrant-client
11
+ aiofiles
12
+ jinja2
13
+ uvicorn[standard]
14
+ websockets
15
+ python-dotenv
16
+ httpx
17
+ accelerate
static/image.png ADDED
static/js/script.js ADDED
@@ -0,0 +1,546 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ console.log('script.js loaded');
2
+
3
+ let currentPath = null;
4
+ let folderModal = null;
5
+ let selectedFolder = null;
6
+ let ws = null;
7
+
8
+ // Initialize WebSocket connection
9
+ function connectWebSocket() {
10
+ ws = new WebSocket(`ws://${window.location.host}/ws`);
11
+
12
+ ws.onopen = function () {
13
+ console.log('WebSocket connected');
14
+ };
15
+
16
+ ws.onmessage = function (event) {
17
+ const status = JSON.parse(event.data);
18
+ updateIndexingStatus(status);
19
+ };
20
+
21
+ ws.onclose = function () {
22
+ console.log('WebSocket disconnected, attempting to reconnect...');
23
+ setTimeout(connectWebSocket, 1000);
24
+ };
25
+
26
+ ws.onerror = function (error) {
27
+ console.error('WebSocket error:', error);
28
+ };
29
+ }
30
+
31
+ // Update indexing progress
32
+ function updateIndexingStatus(status) {
33
+ const statusDiv = document.getElementById('indexingStatus');
34
+ const progressBar = statusDiv.querySelector('.progress-bar');
35
+ const details = document.getElementById('indexingDetails');
36
+
37
+ if (status.status === 'idle') {
38
+ // Fade out the status div
39
+ statusDiv.style.opacity = '0';
40
+ setTimeout(() => {
41
+ statusDiv.style.display = 'none';
42
+ statusDiv.style.opacity = '1';
43
+ }, 500);
44
+ return;
45
+ }
46
+
47
+ // Show and update the status
48
+ statusDiv.style.display = 'block';
49
+ statusDiv.style.opacity = '1';
50
+
51
+ // Calculate progress percentage
52
+ const percentage = status.total_files > 0
53
+ ? Math.round((status.processed_files / status.total_files) * 100)
54
+ : 0;
55
+
56
+ progressBar.style.width = `${percentage}%`;
57
+ progressBar.setAttribute('aria-valuenow', percentage);
58
+
59
+ // Update status text
60
+ let statusText = `Status: ${status.status}`;
61
+ if (status.current_file) {
62
+ statusText += ` | Current file: ${status.current_file}`;
63
+ }
64
+ if (status.total_files > 0) {
65
+ statusText += ` | Progress: ${status.processed_files}/${status.total_files} (${percentage}%)`;
66
+ }
67
+ details.textContent = statusText;
68
+ }
69
+
70
+ // IntersectionObserver for lazy loading images
71
+ let imageObserver = null;
72
+
73
+ function observeLazyLoadImages() {
74
+ const lazyLoadImages = document.querySelectorAll('img.lazy-load');
75
+
76
+ if (imageObserver) {
77
+ // Disconnect previous observer if any
78
+ imageObserver.disconnect();
79
+ }
80
+
81
+ imageObserver = new IntersectionObserver((entries, observer) => {
82
+ entries.forEach(entry => {
83
+ if (entry.isIntersecting) {
84
+ const img = entry.target;
85
+ const fullSrc = img.dataset.src;
86
+
87
+ if (fullSrc) {
88
+ img.src = fullSrc;
89
+ img.removeAttribute('data-src'); // Remove data-src to prevent re-processing
90
+ img.classList.remove('lazy-load'); // Remove class to prevent re-observing
91
+ }
92
+ observer.unobserve(img); // Stop observing the image once loaded
93
+ }
94
+ });
95
+ }, {
96
+ rootMargin: '0px 0px 200px 0px' // Load images 200px before they enter viewport
97
+ });
98
+
99
+ lazyLoadImages.forEach(img => {
100
+ imageObserver.observe(img);
101
+ });
102
+ }
103
+
104
+ // Initialize folder browser
105
+ async function initFolderBrowser() {
106
+ folderModal = new bootstrap.Modal(document.getElementById('folderBrowserModal'));
107
+ await loadFolderContents();
108
+ await loadIndexedFolders();
109
+ }
110
+
111
+ // Open folder browser modal
112
+ function openFolderBrowser() {
113
+ selectedFolder = null;
114
+ folderModal.show();
115
+ loadFolderContents();
116
+ }
117
+
118
+ function showDrives(breadcrumb, browser, data) {
119
+ // Windows drives
120
+ breadcrumb.innerHTML = '<li class="breadcrumb-item active">Drives</li>';
121
+ data.drives.forEach(drive => {
122
+ const escapedDrive = drive.replace(/\\/g, '\\\\').replace(/'/g, "\\'");
123
+ browser.innerHTML += `
124
+ <div class="folder-item" onclick="loadFolderContents('${escapedDrive}')">
125
+ <i class="bi bi-hdd"></i>${drive}
126
+ </div>
127
+ `;
128
+ });
129
+ }
130
+
131
+ function showFolderContents(breadcrumb, browser, data) {
132
+ // Folder contents
133
+ currentPath = data.current_path;
134
+
135
+ // Update breadcrumb
136
+ const pathParts = currentPath.split(/[\\/]/);
137
+ let currentBreadcrumb = '';
138
+ pathParts.forEach((part, index) => {
139
+ if (part) {
140
+ // Check if the path contains backslashes to detect Windows
141
+ const isWindows = currentPath.includes('\\');
142
+ currentBreadcrumb += part + (isWindows ? '\\' : '/');
143
+ const isLast = index === pathParts.length - 1;
144
+ const escapedPath = currentBreadcrumb.replace(/\\/g, '\\\\').replace(/'/g, "\\'");
145
+ breadcrumb.innerHTML += `
146
+ <li class="breadcrumb-item ${isLast ? 'active' : ''}">
147
+ ${isLast ? part : `<a href="#" onclick="loadFolderContents('${escapedPath}')">${part}</a>`}
148
+ </li>
149
+ `;
150
+ }
151
+ });
152
+
153
+ // Add parent directory
154
+ if (data.parent_path) {
155
+ addParentDirectory(browser, data);
156
+ }
157
+
158
+ // Add folders and files
159
+ addFolderContents(browser, data);
160
+ }
161
+
162
+ function addParentDirectory(browser, data) {
163
+ const escapedParentPath = data.parent_path.replace(/\\/g, '\\\\').replace(/'/g, "\\'");
164
+ browser.innerHTML += `
165
+ <div class="folder-item" onclick="loadFolderContents('${escapedParentPath}')">
166
+ <i class="bi bi-arrow-up"></i>..
167
+ </div>
168
+ `;
169
+ }
170
+
171
+ function addFolderContents(browser, data) {
172
+ data.contents.forEach(item => {
173
+ const icon = item.type === 'directory' ? 'bi-folder' : 'bi-image';
174
+ const escapedPath = item.path.replace(/\\/g, '\\\\').replace(/'/g, "\\'");
175
+ browser.innerHTML += `
176
+ <div class="folder-item" onclick="${item.type === 'directory' ? `loadFolderContents('${escapedPath}')` : ''}" ondblclick="${item.type === 'directory' ? `selectFolder('${escapedPath}')` : ''}">
177
+ <i class="bi ${icon}"></i>${item.name}
178
+ </div>
179
+ `;
180
+ });
181
+ }
182
+
183
+ // Load folder contents
184
+ async function loadFolderContents(path = null) {
185
+ try {
186
+ const url = path ? `/browse/${encodeURIComponent(path)}` : '/browse';
187
+ const response = await fetch(url);
188
+ const data = await response.json();
189
+
190
+ const browser = document.getElementById('folderBrowser');
191
+ const breadcrumb = document.getElementById('folderBreadcrumb');
192
+
193
+ browser.innerHTML = '';
194
+ breadcrumb.innerHTML = '';
195
+
196
+ if (data.drives) {
197
+ showDrives(breadcrumb, browser, data);
198
+ } else {
199
+ showFolderContents(breadcrumb, browser, data);
200
+ }
201
+ } catch (error) {
202
+ console.error('Error loading folder contents:', error);
203
+ }
204
+ }
205
+
206
+ // Select folder for indexing
207
+ function selectFolder(path) {
208
+ selectedFolder = path;
209
+ addSelectedFolder();
210
+ }
211
+
212
+ // Add selected folder
213
+ async function addSelectedFolder() {
214
+
215
+ folderModal.hide();
216
+
217
+ if (!selectedFolder && currentPath) {
218
+ selectedFolder = currentPath;
219
+ }
220
+
221
+ if (selectedFolder) {
222
+ try {
223
+ const encodedPath = encodeURIComponent(selectedFolder);
224
+ const response = await fetch(`/folders?folder_path=${encodedPath}`, {
225
+ method: 'POST'
226
+ });
227
+
228
+ if (response.ok) {
229
+ await loadIndexedFolders();
230
+ selectedFolder = null;
231
+ } else {
232
+ const error = await response.json();
233
+ alert(`Error adding folder: ${error.detail || error.message || JSON.stringify(error)}`);
234
+ }
235
+ } catch (error) {
236
+ console.error('Error adding folder:', error);
237
+ alert('Error adding folder. Please try again.');
238
+ }
239
+ }
240
+ }
241
+
242
+ // Load indexed folders
243
+ async function loadIndexedFolders() {
244
+ try {
245
+ const response = await fetch('/folders');
246
+ const folders = await response.json();
247
+
248
+ const folderList = document.getElementById('folderList');
249
+ folderList.innerHTML = '';
250
+
251
+ if (folders.length === 0) {
252
+ folderList.innerHTML = `
253
+ <div class="text-center p-4 text-muted">
254
+ <i class="bi bi-folder-x fs-2 d-block mb-2"></i>
255
+ <small>No folders indexed yet</small>
256
+ </div>
257
+ `;
258
+ return;
259
+ }
260
+
261
+ folders.forEach(folder => {
262
+ const escapedPath = folder.path.replace(/\\/g, '\\\\').replace(/'/g, "\\'");
263
+ const folderCard = document.createElement('div');
264
+ folderCard.className = `folder-item-card ${!folder.is_valid ? 'invalid' : ''}`;
265
+ folderCard.innerHTML = `
266
+ <div class="d-flex justify-content-between align-items-start p-3">
267
+ <div class="flex-grow-1 me-2">
268
+ <div class="d-flex align-items-center mb-1">
269
+ <i class="bi bi-folder-fill me-2 ${folder.is_valid ? 'text-primary' : 'text-danger'}"></i>
270
+ <span class="fw-semibold ${!folder.is_valid ? 'text-danger' : 'text-dark'}" style="font-size: 0.9rem;">
271
+ ${folder.path.split(/[\\/]/).pop()}
272
+ </span>
273
+ </div>
274
+ <div class="text-muted small" style="word-break: break-all; line-height: 1.3;">
275
+ ${folder.path}
276
+ </div>
277
+ ${!folder.is_valid ? '<small class="text-danger"><i class="bi bi-exclamation-triangle me-1"></i>Path not accessible</small>' : ''}
278
+ </div>
279
+ <button class="btn btn-outline-danger btn-sm" onclick="removeFolder('${escapedPath}')" title="Remove folder">
280
+ <i class="bi bi-trash"></i>
281
+ </button>
282
+ </div>
283
+ `;
284
+ folderList.appendChild(folderCard);
285
+ });
286
+
287
+ // Load images from all folders
288
+ await loadImages();
289
+ } catch (error) {
290
+ console.error('Error loading folders:', error);
291
+ }
292
+ }
293
+
294
+ // Remove folder
295
+ async function removeFolder(path) {
296
+ if (confirm('Are you sure you want to remove this folder?')) {
297
+ try {
298
+ const encodedPath = encodeURIComponent(path).replace(/%5C/g, '\\');
299
+ const response = await fetch(`/folders/${encodedPath}`, {
300
+ method: 'DELETE'
301
+ });
302
+
303
+ if (response.ok) {
304
+ await loadIndexedFolders();
305
+ } else {
306
+ const error = await response.text();
307
+ alert(`Error removing folder: ${error}`);
308
+ }
309
+ } catch (error) {
310
+ console.error('Error removing folder:', error);
311
+ alert('Error removing folder. Please try again.');
312
+ }
313
+ }
314
+ }
315
+
316
+ // Load images
317
+ async function loadImages(folder = null) {
318
+ try {
319
+ const url = folder ? `/images?folder=${encodeURIComponent(folder)}` : '/images';
320
+ const response = await fetch(url);
321
+ const images = await response.json();
322
+
323
+ const imageGrid = document.getElementById('imageGrid');
324
+ imageGrid.innerHTML = '';
325
+
326
+ if (images.length === 0) {
327
+ imageGrid.innerHTML = `
328
+ <div class="col-12">
329
+ <div class="text-center p-5">
330
+ <i class="bi bi-images fs-1 text-muted d-block mb-3"></i>
331
+ <h5 class="text-muted mb-2">No images found</h5>
332
+ <p class="text-muted">Add some folders to start indexing your images</p>
333
+ </div>
334
+ </div>
335
+ `;
336
+ return;
337
+ }
338
+
339
+ images.forEach(image => {
340
+ const card = document.createElement('div');
341
+ card.className = 'image-card';
342
+ card.innerHTML = `
343
+ <div class="image-wrapper">
344
+ <img class="lazy-load"
345
+ src="/thumbnail/${image.id}"
346
+ data-src="/image/${image.id}"
347
+ alt="${image.filename || image.path}"
348
+ loading="lazy">
349
+ </div>
350
+ <div class="image-info">
351
+ <span class="filename" title="${image.filename || image.path}">${image.filename || image.path}</span>
352
+ <span class="file-size">${formatFileSize(image.file_size)}</span>
353
+ </div>
354
+ `;
355
+ imageGrid.appendChild(card);
356
+ });
357
+ observeLazyLoadImages(); // Initialize IntersectionObserver for new images
358
+ } catch (error) {
359
+ console.error('Error loading images:', error);
360
+ const imageGrid = document.getElementById('imageGrid');
361
+ imageGrid.innerHTML = '<div class="col-12"><div class="error text-center p-4">Error loading images. Please try again.</div></div>';
362
+ }
363
+ }
364
+
365
+ // Utility function to format file sizes
366
+ function formatFileSize(bytes) {
367
+ if (bytes === 0) return '0 Bytes';
368
+ const k = 1024;
369
+ const sizes = ['Bytes', 'KB', 'MB', 'GB'];
370
+ const i = Math.floor(Math.log(bytes) / Math.log(k));
371
+ return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i];
372
+ }
373
+
374
+ // Get current folder path
375
+ function getCurrentPath() {
376
+ // Return the current path if we're in a folder, otherwise null
377
+ return currentPath;
378
+ }
379
+
380
+ // Search images
381
+ async function searchImages(event) {
382
+ event.preventDefault();
383
+ const query = document.getElementById('searchInput').value;
384
+ if (!query) return;
385
+
386
+ try {
387
+ // Only include folder parameter if we're inside the folder browser
388
+ const searchUrl = `/search/text?query=${encodeURIComponent(query)}`;
389
+ const response = await fetch(searchUrl);
390
+ const results = await response.json();
391
+
392
+ displaySearchResults(results);
393
+ } catch (error) {
394
+ console.error('Error searching images:', error);
395
+ const imageGrid = document.getElementById('imageGrid');
396
+ imageGrid.innerHTML = `
397
+ <div class="col-12">
398
+ <div class="error text-center p-5">
399
+ <i class="bi bi-exclamation-triangle fs-1 text-danger d-block mb-3"></i>
400
+ <h5 class="text-danger mb-2">Search Error</h5>
401
+ <p class="text-muted">An error occurred while searching. Please try again.</p>
402
+ </div>
403
+ </div>
404
+ `;
405
+ }
406
+ }
407
+
408
+ // Search by image
409
+ async function searchByImage(event) {
410
+ const file = event.target.files[0];
411
+ if (!file) return;
412
+
413
+ const formData = new FormData();
414
+ formData.append('file', file);
415
+
416
+ try {
417
+ const searchUrl = '/search/image';
418
+ const response = await fetch(searchUrl, {
419
+ method: 'POST',
420
+ body: formData
421
+ });
422
+ const results = await response.json();
423
+
424
+ displaySearchResults(results);
425
+
426
+ // Reset file input
427
+ event.target.value = '';
428
+ } catch (error) {
429
+ console.error('Error searching by image:', error);
430
+ const imageGrid = document.getElementById('imageGrid');
431
+ imageGrid.innerHTML = `
432
+ <div class="col-12">
433
+ <div class="error text-center p-5">
434
+ <i class="bi bi-exclamation-triangle fs-1 text-danger d-block mb-3"></i>
435
+ <h5 class="text-danger mb-2">Image Search Error</h5>
436
+ <p class="text-muted">An error occurred while processing your image. Please try again.</p>
437
+ </div>
438
+ </div>
439
+ `;
440
+ }
441
+ }
442
+
443
+ // Search by URL
444
+ async function searchByUrl(event) {
445
+ event.preventDefault();
446
+ const url = document.getElementById('urlInput').value;
447
+ if (!url) return;
448
+
449
+ try {
450
+ // Show loading state
451
+ const imageGrid = document.getElementById('imageGrid');
452
+ imageGrid.innerHTML = `
453
+ <div class="col-12">
454
+ <div class="loading text-center p-5">
455
+ <div class="spinner-border text-primary mb-3" role="status">
456
+ <span class="visually-hidden">Loading...</span>
457
+ </div>
458
+ <h5 class="text-primary mb-2">Downloading and analyzing image...</h5>
459
+ <p class="text-muted">This may take a few moments</p>
460
+ </div>
461
+ </div>
462
+ `;
463
+
464
+ const searchUrl = `/search/url?url=${encodeURIComponent(url)}`;
465
+ const response = await fetch(searchUrl);
466
+ const results = await response.json();
467
+
468
+ displaySearchResults(results);
469
+
470
+ // Clear URL input and hide form
471
+ document.getElementById('urlInput').value = '';
472
+ toggleUrlSearch();
473
+ } catch (error) {
474
+ console.error('Error searching by URL:', error);
475
+ const imageGrid = document.getElementById('imageGrid');
476
+ imageGrid.innerHTML = `
477
+ <div class="col-12">
478
+ <div class="error text-center p-5">
479
+ <i class="bi bi-exclamation-triangle fs-1 text-danger d-block mb-3"></i>
480
+ <h5 class="text-danger mb-2">Error processing URL</h5>
481
+ <p class="text-muted">Please check the URL and try again. Make sure it points to a valid image.</p>
482
+ </div>
483
+ </div>
484
+ `;
485
+ }
486
+ }
487
+
488
+ // Display search results (common function for all search types)
489
+ function displaySearchResults(results) {
490
+ const imageGrid = document.getElementById('imageGrid');
491
+ imageGrid.innerHTML = '';
492
+
493
+ if (results.length === 0) {
494
+ imageGrid.innerHTML = `
495
+ <div class="col-12">
496
+ <div class="no-results text-center p-5">
497
+ <i class="bi bi-search fs-1 text-muted d-block mb-3"></i>
498
+ <h5 class="text-muted mb-2">No similar images found</h5>
499
+ <p class="text-muted">Try adjusting your search terms or uploading a different image</p>
500
+ </div>
501
+ </div>
502
+ `;
503
+ return;
504
+ }
505
+
506
+ results.forEach(result => {
507
+ const card = document.createElement('div');
508
+ card.className = 'image-card';
509
+ card.innerHTML = `
510
+ <div class="image-wrapper">
511
+ <img class="lazy-load"
512
+ src="/thumbnail/${result.id}"
513
+ data-src="/image/${result.id}"
514
+ alt="${result.filename || result.path}"
515
+ loading="lazy">
516
+ <div class="similarity-score">${result.similarity}%</div>
517
+ </div>
518
+ <div class="image-info">
519
+ <span class="filename" title="${result.filename || result.path}">${result.filename || result.path}</span>
520
+ <span class="file-size">${formatFileSize(result.file_size)}</span>
521
+ </div>
522
+ `;
523
+ imageGrid.appendChild(card);
524
+ });
525
+ observeLazyLoadImages(); // Initialize IntersectionObserver for new images
526
+ }
527
+
528
+ // Toggle URL search form visibility
529
+ function toggleUrlSearch() {
530
+ const urlForm = document.getElementById('urlSearchForm');
531
+ const isVisible = urlForm.style.display !== 'none';
532
+
533
+ if (isVisible) {
534
+ urlForm.style.display = 'none';
535
+ document.getElementById('urlInput').value = '';
536
+ } else {
537
+ urlForm.style.display = 'flex';
538
+ document.getElementById('urlInput').focus();
539
+ }
540
+ }
541
+
542
+ // Initialize
543
+ document.addEventListener('DOMContentLoaded', () => {
544
+ connectWebSocket();
545
+ initFolderBrowser();
546
+ });
templates/index.html ADDED
@@ -0,0 +1,530 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Visual Product Search</title>
7
+ <link rel="icon" href="/static/image.png" type="image/png">
8
+ <link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet">
9
+ <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/font/bootstrap-icons.css">
10
+ <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet">
11
+ <script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js"></script>
12
+ <script src="/static/js/script.js"></script>
13
+ <style>
14
+ :root {
15
+ --primary-color: #6366f1;
16
+ --primary-dark: #4f46e5;
17
+ --primary-light: #8b5cf6;
18
+ --secondary-color: #f8fafc;
19
+ --accent-color: #06b6d4;
20
+ --text-primary: #1e293b;
21
+ --text-secondary: #64748b;
22
+ --border-color: #e2e8f0;
23
+ --success-color: #10b981;
24
+ --warning-color: #f59e0b;
25
+ --danger-color: #ef4444;
26
+ --gradient-bg: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
27
+ --card-shadow: 0 4px 6px -1px rgba(0, 0, 0, 0.1), 0 2px 4px -1px rgba(0, 0, 0, 0.06);
28
+ --card-shadow-hover: 0 10px 15px -3px rgba(0, 0, 0, 0.1), 0 4px 6px -2px rgba(0, 0, 0, 0.05);
29
+ }
30
+
31
+ * {
32
+ font-family: 'Inter', sans-serif;
33
+ }
34
+
35
+ body {
36
+ background: linear-gradient(135deg, #f8fafc 0%, #e2e8f0 100%);
37
+ min-height: 100vh;
38
+ }
39
+
40
+ .navbar {
41
+ background: var(--gradient-bg) !important;
42
+ backdrop-filter: blur(10px);
43
+ border-bottom: 1px solid rgba(255, 255, 255, 0.1);
44
+ padding: 1rem 0;
45
+ }
46
+
47
+ .navbar-brand {
48
+ font-weight: 700;
49
+ font-size: 1.5rem;
50
+ color: white !important;
51
+ display: flex;
52
+ align-items: center;
53
+ gap: 0.5rem;
54
+ }
55
+
56
+ .brand-icon {
57
+ width: 32px;
58
+ height: 32px;
59
+ background: rgba(255, 255, 255, 0.2);
60
+ border-radius: 8px;
61
+ display: flex;
62
+ align-items: center;
63
+ justify-content: center;
64
+ }
65
+
66
+ .search-container {
67
+ max-width: 600px;
68
+ margin: 0 auto;
69
+ }
70
+
71
+ .search-form {
72
+ background: rgba(255, 255, 255, 0.95);
73
+ backdrop-filter: blur(10px);
74
+ border-radius: 20px;
75
+ padding: 8px;
76
+ box-shadow: var(--card-shadow);
77
+ border: 1px solid rgba(255, 255, 255, 0.2);
78
+ }
79
+
80
+ .search-input {
81
+ border: none;
82
+ background: transparent;
83
+ padding: 12px 20px;
84
+ font-weight: 500;
85
+ }
86
+
87
+ .search-input:focus {
88
+ outline: none;
89
+ box-shadow: none;
90
+ }
91
+
92
+ .search-btn {
93
+ border-radius: 16px;
94
+ padding: 12px 24px;
95
+ font-weight: 600;
96
+ background: var(--primary-color);
97
+ border: none;
98
+ transition: all 0.3s ease;
99
+ }
100
+
101
+ .search-btn:hover {
102
+ background: var(--primary-dark);
103
+ transform: translateY(-1px);
104
+ }
105
+
106
+ .action-btn {
107
+ border-radius: 16px;
108
+ padding: 12px 16px;
109
+ border: 1px solid rgba(255, 255, 255, 0.5);
110
+ background: rgba(255, 255, 255, 0.9);
111
+ color: var(--primary-color);
112
+ font-weight: 600;
113
+ transition: all 0.3s ease;
114
+ min-width: 48px;
115
+ display: flex;
116
+ align-items: center;
117
+ justify-content: center;
118
+ }
119
+
120
+ .action-btn:hover {
121
+ background: rgba(255, 255, 255, 1);
122
+ color: var(--primary-dark);
123
+ transform: translateY(-1px);
124
+ border-color: rgba(255, 255, 255, 0.7);
125
+ box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
126
+ }
127
+
128
+ .action-btn i {
129
+ font-size: 1.1rem;
130
+ }
131
+
132
+ .url-search-form {
133
+ background: rgba(255, 255, 255, 0.95);
134
+ backdrop-filter: blur(10px);
135
+ border-radius: 20px;
136
+ padding: 8px;
137
+ box-shadow: var(--card-shadow);
138
+ border: 1px solid rgba(255, 255, 255, 0.2);
139
+ }
140
+
141
+ .main-container {
142
+ padding-top: 2rem;
143
+ }
144
+
145
+ .sidebar {
146
+ background: white;
147
+ border-radius: 20px;
148
+ padding: 1.5rem;
149
+ box-shadow: var(--card-shadow);
150
+ border: 1px solid var(--border-color);
151
+ height: fit-content;
152
+ position: sticky;
153
+ top: 2rem;
154
+ }
155
+
156
+ .sidebar-title {
157
+ font-weight: 600;
158
+ color: var(--text-primary);
159
+ margin-bottom: 1rem;
160
+ display: flex;
161
+ align-items: center;
162
+ gap: 0.5rem;
163
+ }
164
+
165
+ .folder-list {
166
+ border: none;
167
+ }
168
+
169
+ .folder-item-card {
170
+ border: 1px solid var(--border-color);
171
+ border-radius: 12px;
172
+ margin-bottom: 8px;
173
+ transition: all 0.3s ease;
174
+ background: white;
175
+ }
176
+
177
+ .folder-item-card:hover {
178
+ transform: translateY(-2px);
179
+ box-shadow: var(--card-shadow-hover);
180
+ border-color: var(--primary-color);
181
+ }
182
+
183
+ .folder-item-card.invalid {
184
+ border-color: var(--danger-color);
185
+ background: #fef2f2;
186
+ }
187
+
188
+ .add-folder-btn {
189
+ background: var(--primary-color);
190
+ color: white;
191
+ border: none;
192
+ border-radius: 12px;
193
+ padding: 12px 20px;
194
+ font-weight: 600;
195
+ width: 100%;
196
+ margin-bottom: 1rem;
197
+ transition: all 0.3s ease;
198
+ }
199
+
200
+ .add-folder-btn:hover {
201
+ background: var(--primary-dark);
202
+ transform: translateY(-1px);
203
+ color: white;
204
+ }
205
+
206
+ .content-area {
207
+ background: white;
208
+ border-radius: 20px;
209
+ padding: 2rem;
210
+ box-shadow: var(--card-shadow);
211
+ border: 1px solid var(--border-color);
212
+ min-height: 70vh;
213
+ }
214
+
215
+ .image-grid {
216
+ display: grid;
217
+ grid-template-columns: repeat(auto-fill, minmax(280px, 1fr));
218
+ gap: 1.5rem;
219
+ padding: 1rem 0;
220
+ }
221
+
222
+ .image-card {
223
+ position: relative;
224
+ background: white;
225
+ border-radius: 16px;
226
+ overflow: hidden;
227
+ box-shadow: var(--card-shadow);
228
+ border: 1px solid var(--border-color);
229
+ transition: all 0.3s ease;
230
+ }
231
+
232
+ .image-card:hover {
233
+ transform: translateY(-4px);
234
+ box-shadow: var(--card-shadow-hover);
235
+ }
236
+
237
+ .image-wrapper {
238
+ aspect-ratio: 1;
239
+ overflow: hidden;
240
+ position: relative;
241
+ }
242
+
243
+ .image-card img {
244
+ width: 100%;
245
+ height: 100%;
246
+ object-fit: cover;
247
+ transition: transform 0.3s ease;
248
+ }
249
+
250
+ .image-card:hover img {
251
+ transform: scale(1.05);
252
+ }
253
+
254
+ .similarity-score {
255
+ position: absolute;
256
+ top: 12px;
257
+ right: 12px;
258
+ background: var(--primary-color);
259
+ color: white;
260
+ padding: 6px 12px;
261
+ border-radius: 20px;
262
+ font-size: 0.875rem;
263
+ font-weight: 600;
264
+ backdrop-filter: blur(10px);
265
+ }
266
+
267
+ .image-info {
268
+ padding: 1rem;
269
+ background: white;
270
+ }
271
+
272
+ .filename {
273
+ display: block;
274
+ font-weight: 600;
275
+ color: var(--text-primary);
276
+ margin-bottom: 0.5rem;
277
+ overflow: hidden;
278
+ text-overflow: ellipsis;
279
+ white-space: nowrap;
280
+ }
281
+
282
+ .file-size {
283
+ font-size: 0.875rem;
284
+ color: var(--text-secondary);
285
+ font-weight: 500;
286
+ }
287
+
288
+ .status-card {
289
+ background: white;
290
+ border-radius: 16px;
291
+ padding: 1.5rem;
292
+ box-shadow: var(--card-shadow);
293
+ border: 1px solid var(--border-color);
294
+ margin-bottom: 1.5rem;
295
+ }
296
+
297
+ .progress {
298
+ height: 8px;
299
+ border-radius: 20px;
300
+ background: #f1f5f9;
301
+ overflow: hidden;
302
+ }
303
+
304
+ .progress-bar {
305
+ background: var(--gradient-bg);
306
+ border-radius: 20px;
307
+ transition: width 0.3s ease;
308
+ }
309
+
310
+ .no-results, .error, .loading {
311
+ text-align: center;
312
+ padding: 3rem;
313
+ color: var(--text-secondary);
314
+ font-weight: 500;
315
+ }
316
+
317
+ .error {
318
+ color: var(--danger-color);
319
+ }
320
+
321
+ .loading {
322
+ color: var(--primary-color);
323
+ }
324
+
325
+ .folder-browser {
326
+ max-height: 400px;
327
+ overflow-y: auto;
328
+ border-radius: 12px;
329
+ border: 1px solid var(--border-color);
330
+ }
331
+
332
+ .folder-item {
333
+ cursor: pointer;
334
+ padding: 12px 16px;
335
+ border-radius: 8px;
336
+ margin: 4px;
337
+ transition: all 0.2s ease;
338
+ display: flex;
339
+ align-items: center;
340
+ gap: 12px;
341
+ }
342
+
343
+ .folder-item:hover {
344
+ background: var(--secondary-color);
345
+ color: var(--primary-color);
346
+ }
347
+
348
+ .modal-content {
349
+ border-radius: 20px;
350
+ border: none;
351
+ box-shadow: 0 20px 25px -5px rgba(0, 0, 0, 0.1), 0 10px 10px -5px rgba(0, 0, 0, 0.04);
352
+ }
353
+
354
+ .modal-header {
355
+ border-bottom: 1px solid var(--border-color);
356
+ padding: 1.5rem;
357
+ }
358
+
359
+ .breadcrumb {
360
+ background: var(--secondary-color);
361
+ border-radius: 12px;
362
+ padding: 12px 16px;
363
+ }
364
+
365
+ .btn-primary {
366
+ background: var(--primary-color);
367
+ border: none;
368
+ border-radius: 12px;
369
+ padding: 10px 20px;
370
+ font-weight: 600;
371
+ }
372
+
373
+ .btn-primary:hover {
374
+ background: var(--primary-dark);
375
+ }
376
+
377
+ .btn-secondary {
378
+ background: var(--text-secondary);
379
+ border: none;
380
+ border-radius: 12px;
381
+ padding: 10px 20px;
382
+ font-weight: 600;
383
+ }
384
+
385
+ .btn-outline-danger {
386
+ border-color: var(--danger-color);
387
+ color: var(--danger-color);
388
+ border-radius: 8px;
389
+ font-weight: 600;
390
+ }
391
+
392
+ .btn-outline-danger:hover {
393
+ background: var(--danger-color);
394
+ border-color: var(--danger-color);
395
+ }
396
+
397
+ @media (max-width: 768px) {
398
+ .search-container {
399
+ padding: 0 1rem;
400
+ }
401
+
402
+ .main-container {
403
+ padding-top: 1rem;
404
+ }
405
+
406
+ .image-grid {
407
+ grid-template-columns: repeat(auto-fill, minmax(200px, 1fr));
408
+ gap: 1rem;
409
+ }
410
+
411
+ .sidebar {
412
+ margin-bottom: 1.5rem;
413
+ }
414
+ }
415
+ </style>
416
+ </head>
417
+ <body>
418
+ <nav class="navbar navbar-expand-lg" aria-label="Main navigation">
419
+ <div class="container-fluid px-4">
420
+ <a class="navbar-brand" href="#">
421
+ <div class="brand-icon">
422
+ <i class="bi bi-search"></i>
423
+ </div>
424
+ Visual Product Search
425
+ </a>
426
+
427
+ <div class="search-container">
428
+ <form class="search-form d-flex" onsubmit="searchImages(event)">
429
+ <input class="form-control search-input" type="search" id="searchInput" placeholder="Search products and images...">
430
+ <button class="btn btn-primary search-btn" type="submit">
431
+ <i class="bi bi-search me-1"></i>Search
432
+ </button>
433
+ <label class="btn action-btn ms-2" for="imageUpload" title="Search by image">
434
+ <i class="bi bi-image"></i>
435
+ </label>
436
+ <input type="file" id="imageUpload" style="display: none" accept="image/*" onchange="searchByImage(event)">
437
+ <button class="btn action-btn ms-2" type="button" onclick="toggleUrlSearch()" title="Search by URL">
438
+ <i class="bi bi-link-45deg"></i>
439
+ </button>
440
+ </form>
441
+
442
+ <!-- URL Search Form (initially hidden) -->
443
+ <form class="url-search-form d-flex mt-3" id="urlSearchForm" style="display: none;" onsubmit="searchByUrl(event)">
444
+ <input class="form-control search-input" type="url" id="urlInput" placeholder="Enter image URL..." required>
445
+ <button class="btn btn-primary search-btn" type="submit">
446
+ <i class="bi bi-link me-1"></i>Search URL
447
+ </button>
448
+ <button class="btn action-btn ms-2" type="button" onclick="toggleUrlSearch()">
449
+ <i class="bi bi-x"></i>
450
+ </button>
451
+ </form>
452
+ </div>
453
+ </div>
454
+ </nav>
455
+
456
+ <!-- Indexing Progress -->
457
+ <div class="container main-container" id="indexingStatus" style="display: none;">
458
+ <div class="status-card">
459
+ <h6 class="mb-3">
460
+ <i class="bi bi-gear-fill me-2 text-primary"></i>
461
+ Indexing Progress
462
+ </h6>
463
+ <div class="progress mb-3">
464
+ <div class="progress-bar progress-bar-striped progress-bar-animated" style="width: 0%"></div>
465
+ </div>
466
+ <p class="mb-0 text-muted" id="indexingDetails"></p>
467
+ </div>
468
+ </div>
469
+
470
+ <!-- Main Content -->
471
+ <div class="container main-container">
472
+ <div class="row g-4">
473
+ <div class="col-lg-3">
474
+ <div class="sidebar">
475
+ <button class="add-folder-btn" onclick="openFolderBrowser()">
476
+ <i class="bi bi-folder-plus me-2"></i>Add Folder
477
+ </button>
478
+
479
+ <div class="sidebar-title">
480
+ <i class="bi bi-folder2-open"></i>
481
+ Indexed Folders
482
+ </div>
483
+ <div id="folderList">
484
+ <!-- Folders will be listed here -->
485
+ </div>
486
+ </div>
487
+ </div>
488
+
489
+ <div class="col-lg-9">
490
+ <div class="content-area">
491
+ <div class="image-grid" id="imageGrid">
492
+ <!-- Images will be displayed here -->
493
+ </div>
494
+ </div>
495
+ </div>
496
+ </div>
497
+ </div>
498
+
499
+ <!-- Folder Browser Modal -->
500
+ <div class="modal fade" id="folderBrowserModal" tabindex="-1">
501
+ <div class="modal-dialog modal-lg">
502
+ <div class="modal-content">
503
+ <div class="modal-header">
504
+ <h5 class="modal-title">
505
+ <i class="bi bi-folder2-open me-2 text-primary"></i>
506
+ Choose Folder to Index
507
+ </h5>
508
+ <button type="button" class="btn-close" data-bs-dismiss="modal"></button>
509
+ </div>
510
+ <div class="modal-body">
511
+ <nav aria-label="breadcrumb">
512
+ <ol class="breadcrumb" id="folderBreadcrumb">
513
+ <li class="breadcrumb-item active">Root</li>
514
+ </ol>
515
+ </nav>
516
+ <div class="folder-browser" id="folderBrowser">
517
+ <!-- Folder contents will be displayed here -->
518
+ </div>
519
+ </div>
520
+ <div class="modal-footer">
521
+ <button type="button" class="btn btn-secondary" data-bs-dismiss="modal">Cancel</button>
522
+ <button type="button" class="btn btn-primary" onclick="addSelectedFolder()">
523
+ <i class="bi bi-plus-circle me-1"></i>Add Folder
524
+ </button>
525
+ </div>
526
+ </div>
527
+ </div>
528
+ </div>
529
+ </body>
530
+ </html>
tests/test_qdrant_singleton.py ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import pytest
2
+ import uuid
3
+ from pathlib import Path
4
+ import shutil
5
+ from qdrant_singleton import QdrantClientSingleton, CURRENT_SCHEMA_VERSION
6
+ from qdrant_client.http import models
7
+
8
+ @pytest.fixture(autouse=True)
9
+ def setup_teardown():
10
+ """Setup and teardown for each test"""
11
+ # Store original state
12
+ original_path = QdrantClientSingleton._storage_path
13
+ original_instance = QdrantClientSingleton._instance
14
+
15
+ # Create temporary storage
16
+ temp_path = Path("test_qdrant_data")
17
+ QdrantClientSingleton._storage_path = temp_path
18
+ QdrantClientSingleton._instance = None
19
+
20
+ yield
21
+
22
+ # Cleanup
23
+ if QdrantClientSingleton._instance:
24
+ QdrantClientSingleton._instance.close()
25
+
26
+ # Restore original state
27
+ QdrantClientSingleton._instance = original_instance
28
+ QdrantClientSingleton._storage_path = original_path
29
+
30
+ # Remove test directory if it exists
31
+ if temp_path.exists():
32
+ shutil.rmtree(temp_path)
33
+
34
+ def test_singleton_pattern():
35
+ """Test that get_instance returns the same instance"""
36
+ instance1 = QdrantClientSingleton.get_instance()
37
+ instance2 = QdrantClientSingleton.get_instance()
38
+ assert instance1 is instance2
39
+
40
+ def test_storage_path_creation():
41
+ """Test that storage path is created if it doesn't exist"""
42
+ assert not QdrantClientSingleton._storage_path.exists()
43
+ QdrantClientSingleton.get_instance()
44
+ assert QdrantClientSingleton._storage_path.exists()
45
+
46
+ def test_collection_creation():
47
+ """Test collection creation"""
48
+ client = QdrantClientSingleton.get_instance()
49
+ collection_name = "test_collection"
50
+
51
+ # Create collection
52
+ QdrantClientSingleton.initialize_collection(collection_name)
53
+
54
+ # Check collection exists
55
+ collections = client.get_collections().collections
56
+ collection_names = [collection.name for collection in collections]
57
+ assert collection_name in collection_names
58
+
59
+ def test_schema_version_check():
60
+ """Test schema version checking and updating"""
61
+ client = QdrantClientSingleton.get_instance()
62
+ collection_name = "test_schema_collection"
63
+
64
+ # Create collection
65
+ QdrantClientSingleton.initialize_collection(collection_name)
66
+
67
+ # Add a point with current schema version
68
+ point_id = str(uuid.uuid4())
69
+ client.upsert(
70
+ collection_name=collection_name,
71
+ points=[
72
+ models.PointStruct(
73
+ id=point_id,
74
+ vector=[0.0] * 512, # VECTOR_SIZE
75
+ payload={
76
+ "path": "test.jpg",
77
+ "absolute_path": "/test/test.jpg",
78
+ "schema_version": CURRENT_SCHEMA_VERSION,
79
+ "indexed_at": 123456789
80
+ }
81
+ )
82
+ ]
83
+ )
84
+
85
+ # Verify point was added
86
+ search_result = client.scroll(
87
+ collection_name=collection_name,
88
+ limit=1
89
+ )
90
+ assert len(search_result[0]) == 1
91
+ assert search_result[0][0].id == point_id
92
+ assert search_result[0][0].payload["schema_version"] == CURRENT_SCHEMA_VERSION
93
+
94
+ def test_payload_indexes():
95
+ """Test that payload indexes are created correctly"""
96
+ client = QdrantClientSingleton.get_instance()
97
+ collection_name = "test_indexes"
98
+
99
+ # Create collection
100
+ QdrantClientSingleton.initialize_collection(collection_name)
101
+
102
+ # Get collection info
103
+ collection_info = client.get_collection(collection_name)
104
+
105
+ # Check that collection exists and has correct vector size
106
+ assert collection_info.config.params.vectors.size == 512
107
+ assert collection_info.config.params.vectors.distance == models.Distance.COSINE
108
+
109
+ def test_empty_collection_schema_check():
110
+ """Test schema check behavior with empty collection"""
111
+ client = QdrantClientSingleton.get_instance()
112
+ collection_name = "test_empty_collection"
113
+
114
+ # Create collection
115
+ QdrantClientSingleton.initialize_collection(collection_name)
116
+
117
+ # Verify collection exists
118
+ collections = client.get_collections().collections
119
+ collection_names = [collection.name for collection in collections]
120
+ assert collection_name in collection_names