metadata

title: redact-video-demo
app_file: app.py
sdk: gradio
sdk_version: 5.13.2

Video Object Detection with Moondream

This tool uses Moondream2, a powerful yet lightweight vision-language model, to detect and visualize objects in videos. Moondream can recognize a wide variety of objects, people, text, and more with high accuracy while being much smaller than traditional models.

About Moondream

Moondream is a tiny yet powerful vision-language model that can analyze images and answer questions about them. It's designed to be lightweight and efficient while maintaining high accuracy. Some key features:

Only 2B parameters
Fast inference with minimal resource requirements
Supports CPU and GPU execution
Open source and free to use
Can detect almost anything you can describe in natural language

Links:

Features

Real-time object detection in videos using Moondream2
Multiple visualization styles:
- Censor: Black boxes over detected objects
- YOLO: Traditional bounding boxes with labels
- Hitmarker: Call of Duty style crosshair markers
Optional grid-based detection for improved accuracy
Flexible object type detection using natural language
Frame-by-frame processing with IoU-based merging
Batch processing of multiple videos
Web-compatible output format
User-friendly web interface
Command-line interface for automation

Requirements

Python 3.8+
OpenCV (cv2)
PyTorch
Transformers
Pillow (PIL)
tqdm
ffmpeg
numpy
gradio (for web interface)

Installation

Clone this repository and create a new virtual environment

git clone https://github.com/parsakhaz/object-detect-video.git
python -m venv .venv
source .venv/bin/activate

Install the required packages:

pip install -r requirements.txt

Install ffmpeg:
- On Ubuntu/Debian: sudo apt-get install ffmpeg libvips
- On macOS: brew install ffmpeg
- On Windows: Download from ffmpeg.org

Usage

Web Interface

Start the web interface:

python app.py

Open the provided URL in your browser
Use the interface to:
- Upload your video
- Specify what to censor (e.g., face, logo, text)
- Adjust processing speed and quality
- Configure grid size for detection
- Process and download the censored video

Command Line Interface

Create an inputs directory in the same folder as the script:

mkdir inputs

Place your video files in the inputs directory. Supported formats:
- .mp4
- .avi
- .mov
- .mkv
- .webm
Run the script:

python main.py

Optional Arguments:

--test: Process only first 3 seconds of each video (useful for testing detection settings)

python main.py --test

--preset: Choose FFmpeg encoding preset (affects output quality vs. speed)

python main.py --preset ultrafast  # Fastest, lower quality
python main.py --preset veryslow   # Slowest, highest quality

--detect: Specify what object type to detect (using natural language)

python main.py --detect person     # Detect people
python main.py --detect "red car"  # Detect red cars
python main.py --detect "person wearing a hat"  # Detect people with hats

--box-style: Choose visualization style

python main.py --box-style censor     # Black boxes (default)
python main.py --box-style yolo       # YOLO-style boxes with labels
python main.py --box-style hitmarker  # COD-style hitmarkers

--rows and --cols: Enable grid-based detection by splitting frames

python main.py --rows 2 --cols 2   # Split each frame into 2x2 grid
python main.py --rows 3 --cols 3   # Split each frame into 3x3 grid
```

You can combine arguments:
```bash
python main.py --detect "person wearing sunglasses" --box-style yolo --test --preset "fast" --rows 2 --cols 2
```

### Visualization Styles

The tool supports three different visualization styles for detected objects:

1. **Censor** (default)
   - Places solid black rectangles over detected objects
   - Best for privacy and content moderation
   - Completely obscures the detected region

2. **YOLO**
   - Traditional object detection style
   - Red bounding box around detected objects
   - Label showing object type above the box
   - Good for analysis and debugging

3. **Hitmarker**
   - Call of Duty inspired visualization
   - White crosshair marker at center of detected objects
   - Small label above the marker
   - Stylistic choice for gaming-inspired visualization

Choose the style that best fits your use case using the `--box-style` argument.

## Output

Processed videos will be saved in the `outputs` directory with the format:
`[style]_[object_type]_[original_filename].mp4`

For example:
- `censor_face_video.mp4`
- `yolo_person_video.mp4`
- `hitmarker_car_video.mp4`

The output videos will include:
- Original video content
- Selected visualization style for detected objects
- Web-compatible H.264 encoding

## Notes

- Processing time depends on video length, grid size, and GPU availability
- GPU is strongly recommended for faster processing
- Requires sufficient disk space for temporary files
- Detection quality may vary based on object type and video quality
- Detection accuracy depends on Moondream2's ability to recognize the specified object type
- Grid-based detection should only be used when necessary due to significant performance impact
- Web interface provides real-time progress updates and error messages
- Different visualization styles may be more suitable for different use cases
- Moondream can detect almost anything you can describe in natural language