---
title: Vision Web App
emoji: 📷
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
---

# Vision Web App - Object Detection Demo

A multi-model object detection and image classification demo using YOLOv8, DETR, and ViT models. This project is designed to showcase different computer vision models for a hiring demonstration.

## Project Architecture

This project follows a phased development approach:

### Phase 0: PoC with Gradio (Original)
- Simple Gradio interface with multiple object detection models
- Uses Hugging Face's free tier for model hosting
- Easy to deploy to Hugging Face Spaces

### Phase 1: Service Separation (Implemented)
- Backend: Flask API with model inference endpoints
- REST API endpoints for model inference
- JSON responses with detection results and performance metrics

### Phase 2: UI Upgrade (Implemented)
- Modern React frontend with Material-UI components
- Improved user experience with responsive design
- Separate frontend and backend architecture

### Phase 3: CI/CD & Testing (Planned)
- GitHub Actions for automated testing and deployment
- Comprehensive test suite with pytest and ESLint
- Automatic rebuilds on Hugging Face Spaces

## How to Run

### Option 1: Original Gradio App
1. Install dependencies:
   ```bash
   pip install -r requirements.txt
   ```

2. Run the Gradio app:
   ```bash
   python app.py
   ```

3. Open your browser and go to the URL shown in the terminal (typically `http://127.0.0.1:7860`)

### Option 2: React Frontend with Flask Backend
1. Install backend dependencies:
   ```bash
   pip install -r requirements.txt
   ```

2. Start the Flask backend server:
   ```bash
   python api.py
   ```

3. In a separate terminal, navigate to the frontend directory:
   ```bash
   cd frontend
   ```

4. Install frontend dependencies:
   ```bash
   npm install
   ```

5. Start the React development server:
   ```bash
   npm start
   ```

6. Open your browser and go to `http://localhost:3000`

## Models Used

- **YOLOv8**: Fast and accurate object detection
- **DETR**: DEtection TRansformer for object detection
- **ViT**: Vision Transformer for image classification
- **TinyLlama**: For natural language processing and question answering about detected objects

## API Endpoints

The Flask backend provides the following API endpoints:

- `GET /api/status` - Check the status of the API and available models
- `POST /api/detect/yolo` - Detect objects using YOLOv8
- `POST /api/detect/detr` - Detect objects using DETR
- `POST /api/classify/vit` - Classify images using ViT

All POST endpoints accept form data with an 'image' field containing the image file.

## Deployment

### Gradio App
The Gradio app is designed to be easily deployed to Hugging Face Spaces:

1. Create a new Space on Hugging Face
2. Select Gradio as the SDK
3. Push this repository to the Space's git repository
4. The app will automatically deploy

### React + Flask App
For the React + Flask version, you'll need to:

1. Build the React frontend:
   ```bash
   cd frontend
   npm run build
   ```

2. Serve the static files from a web server or cloud hosting service
3. Deploy the Flask backend to a server that supports Python