Spaces:
Running
title: Vision Llm Agent
emoji: π
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
license: gpl-3.0
Vision LLM Agent - Object Detection with AI Assistant
A multi-model object detection and image classification demo with LLM-based AI assistant for answering questions about detected objects. This project uses YOLOv8, DETR, and ViT models for vision tasks, and TinyLlama for natural language processing. The application includes a secure login system to protect access to the AI features.
Project Architecture
This project follows a phased development approach:
Phase 0: PoC with Gradio (Original)
- Simple Gradio interface with multiple object detection models
- Uses Hugging Face's free tier for model hosting
- Easy to deploy to Hugging Face Spaces
Phase 1: Service Separation (Implemented)
- Backend: Flask API with model inference endpoints
- REST API endpoints for model inference
- JSON responses with detection results and performance metrics
Phase 2: UI Upgrade (Implemented)
- Modern React frontend with Material-UI components
- Improved user experience with responsive design
- Separate frontend and backend architecture
Phase 3: CI/CD & Testing (Planned)
- GitHub Actions for automated testing and deployment
- Comprehensive test suite with pytest and ESLint
- Automatic rebuilds on Hugging Face Spaces
How to Run
Option 1: Original Gradio App
Install dependencies:
pip install -r requirements.txt
Run the Gradio app:
python app.py
Open your browser and go to the URL shown in the terminal (typically
http://127.0.0.1:7860
)
Option 2: React Frontend with Flask Backend
Install backend dependencies:
pip install -r requirements.txt
Start the Flask backend server:
python api.py
In a separate terminal, navigate to the frontend directory:
cd frontend
Install frontend dependencies:
npm install
Start the React development server:
npm start
Open your browser and go to
http://localhost:3000
Models Used
- YOLOv8: Fast and accurate object detection
- DETR: DEtection TRansformer for object detection
- ViT: Vision Transformer for image classification
- TinyLlama: For natural language processing and question answering about detected objects
Authentication
The application includes a secure login system to protect access to all features:
Default Credentials:
- Username:
admin
/ Password:admin123
- Username:
user
/ Password:user123
- Username:
Login Process:
- All routes and API endpoints are protected with Flask-Login
- Users must authenticate before accessing any features
- Session management handles login state persistence
Security Features:
- Password protection for all API endpoints and UI pages
- Session-based authentication with secure cookies
- Configurable secret key via environment variables
API Endpoints
The Flask backend provides the following API endpoints (all require authentication):
GET /api/status
- Check the status of the API and available modelsPOST /api/detect/yolo
- Detect objects using YOLOv8POST /api/detect/detr
- Detect objects using DETRPOST /api/classify/vit
- Classify images using ViTPOST /api/analyze
- Analyze images with LLM assistantPOST /api/similar-images
- Find similar images in the vector databasePOST /api/add-to-collection
- Add images to the vector databasePOST /api/add-detected-objects
- Add detected objects to the vector databasePOST /api/search-similar-objects
- Search for similar objects in the vector database
All POST endpoints accept form data with an 'image' field containing the image file.
Deployment
Gradio App
The Gradio app is designed to be easily deployed to Hugging Face Spaces:
- Create a new Space on Hugging Face
- Select Gradio as the SDK
- Push this repository to the Space's git repository
- The app will automatically deploy
React + Flask App
For the React + Flask version, you'll need to:
Build the React frontend:
cd frontend npm run build
Serve the static files from a web server or cloud hosting service
Deploy the Flask backend to a server that supports Python