File size: 4,363 Bytes
abb0a18
 
 
 
 
 
 
 
 
 
8ed5ac1
 
a2e8511
8ed5ac1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a2e8511
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ed5ac1
 
a2e8511
8ed5ac1
 
 
 
 
a2e8511
 
 
 
 
8ed5ac1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
title: Vision Llm Agent
emoji: πŸŒ–
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
license: gpl-3.0
---

# Vision LLM Agent - Object Detection with AI Assistant

A multi-model object detection and image classification demo with LLM-based AI assistant for answering questions about detected objects. This project uses YOLOv8, DETR, and ViT models for vision tasks, and TinyLlama for natural language processing. The application includes a secure login system to protect access to the AI features.

## Project Architecture

This project follows a phased development approach:

### Phase 0: PoC with Gradio (Original)
- Simple Gradio interface with multiple object detection models
- Uses Hugging Face's free tier for model hosting
- Easy to deploy to Hugging Face Spaces

### Phase 1: Service Separation (Implemented)
- Backend: Flask API with model inference endpoints
- REST API endpoints for model inference
- JSON responses with detection results and performance metrics

### Phase 2: UI Upgrade (Implemented)
- Modern React frontend with Material-UI components
- Improved user experience with responsive design
- Separate frontend and backend architecture

### Phase 3: CI/CD & Testing (Planned)
- GitHub Actions for automated testing and deployment
- Comprehensive test suite with pytest and ESLint
- Automatic rebuilds on Hugging Face Spaces

## How to Run

### Option 1: Original Gradio App
1. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```

2. Run the Gradio app:
   ```bash
   python app.py
   ```

3. Open your browser and go to the URL shown in the terminal (typically `http://127.0.0.1:7860`)

### Option 2: React Frontend with Flask Backend
1. Install backend dependencies:
   ```bash
   pip install -r requirements.txt
   ```

2. Start the Flask backend server:
   ```bash
   python api.py
   ```

3. In a separate terminal, navigate to the frontend directory:
   ```bash
   cd frontend
   ```

4. Install frontend dependencies:
   ```bash
   npm install
   ```

5. Start the React development server:
   ```bash
   npm start
   ```

6. Open your browser and go to `http://localhost:3000`

## Models Used

- **YOLOv8**: Fast and accurate object detection
- **DETR**: DEtection TRansformer for object detection
- **ViT**: Vision Transformer for image classification
- **TinyLlama**: For natural language processing and question answering about detected objects

## Authentication

The application includes a secure login system to protect access to all features:

- **Default Credentials**:
  - Username: `admin` / Password: `admin123`
  - Username: `user` / Password: `user123`

- **Login Process**:
  - All routes and API endpoints are protected with Flask-Login
  - Users must authenticate before accessing any features
  - Session management handles login state persistence

- **Security Features**:
  - Password protection for all API endpoints and UI pages
  - Session-based authentication with secure cookies
  - Configurable secret key via environment variables

## API Endpoints

The Flask backend provides the following API endpoints (all require authentication):

- `GET /api/status` - Check the status of the API and available models
- `POST /api/detect/yolo` - Detect objects using YOLOv8
- `POST /api/detect/detr` - Detect objects using DETR
- `POST /api/classify/vit` - Classify images using ViT
- `POST /api/analyze` - Analyze images with LLM assistant
- `POST /api/similar-images` - Find similar images in the vector database
- `POST /api/add-to-collection` - Add images to the vector database
- `POST /api/add-detected-objects` - Add detected objects to the vector database
- `POST /api/search-similar-objects` - Search for similar objects in the vector database

All POST endpoints accept form data with an 'image' field containing the image file.

## Deployment

### Gradio App
The Gradio app is designed to be easily deployed to Hugging Face Spaces:

1. Create a new Space on Hugging Face
2. Select Gradio as the SDK
3. Push this repository to the Space's git repository
4. The app will automatically deploy

### React + Flask App
For the React + Flask version, you'll need to:

1. Build the React frontend:
   ```bash
   cd frontend
   npm run build
   ```

2. Serve the static files from a web server or cloud hosting service
3. Deploy the Flask backend to a server that supports Python