Spaces:

ParsaKhaz
/

redact-video-demo

Running on Zero

App Files Files Community

redact-video-demo / README.md

ParsaKhaz

Update README.md

bc7df07 verified 26 days ago

preview code

raw

history blame contribute delete

5.85 kB

	---
	title: redact-video-demo
	app_file: app.py
	sdk: gradio
	sdk_version: 5.13.2
	---
	# Video Object Detection with Moondream

	This tool uses Moondream2, a powerful yet lightweight vision-language model, to detect and visualize objects in videos. Moondream can recognize a wide variety of objects, people, text, and more with high accuracy while being much smaller than traditional models.

	## About Moondream

	Moondream is a tiny yet powerful vision-language model that can analyze images and answer questions about them. It's designed to be lightweight and efficient while maintaining high accuracy. Some key features:

	- Only 2B parameters
	- Fast inference with minimal resource requirements
	- Supports CPU and GPU execution
	- Open source and free to use
	- Can detect almost anything you can describe in natural language

	Links:
	- [GitHub Repository](https://github.com/vikhyat/moondream)
	- [Hugging Face](https://huggingface.co/vikhyatk/moondream2)
	- [Build with Moondream](http://docs.moondream.ai/)

	## Features

	- Real-time object detection in videos using Moondream2
	- Multiple visualization styles:
	- Censor: Black boxes over detected objects
	- YOLO: Traditional bounding boxes with labels
	- Hitmarker: Call of Duty style crosshair markers
	- Optional grid-based detection for improved accuracy
	- Flexible object type detection using natural language
	- Frame-by-frame processing with IoU-based merging
	- Batch processing of multiple videos
	- Web-compatible output format
	- User-friendly web interface
	- Command-line interface for automation

	## Requirements

	- Python 3.8+
	- OpenCV (cv2)
	- PyTorch
	- Transformers
	- Pillow (PIL)
	- tqdm
	- ffmpeg
	- numpy
	- gradio (for web interface)

	## Installation

	1. Clone this repository and create a new virtual environment
	~~~bash
	git clone https://github.com/parsakhaz/object-detect-video.git
	python -m venv .venv
	source .venv/bin/activate
	~~~
	2. Install the required packages:
	~~~bash
	pip install -r requirements.txt
	~~~
	3. Install ffmpeg:
	- On Ubuntu/Debian: `sudo apt-get install ffmpeg libvips`
	- On macOS: `brew install ffmpeg`
	- On Windows: Download from [ffmpeg.org](https://ffmpeg.org/download.html)

	## Usage

	### Web Interface

	1. Start the web interface:
	```bash
	python app.py
	```

	2. Open the provided URL in your browser

	3. Use the interface to:
	- Upload your video
	- Specify what to censor (e.g., face, logo, text)
	- Adjust processing speed and quality
	- Configure grid size for detection
	- Process and download the censored video

	### Command Line Interface

	1. Create an `inputs` directory in the same folder as the script:
	~~~bash
	mkdir inputs
	~~~

	2. Place your video files in the `inputs` directory. Supported formats:
	- .mp4
	- .avi
	- .mov
	- .mkv
	- .webm

	3. Run the script:
	~~~bash
	python main.py
	~~~

	### Optional Arguments:
	- `--test`: Process only first 3 seconds of each video (useful for testing detection settings)
	~~~bash
	python main.py --test
	~~~

	- `--preset`: Choose FFmpeg encoding preset (affects output quality vs. speed)
	~~~bash
	python main.py --preset ultrafast # Fastest, lower quality
	python main.py --preset veryslow # Slowest, highest quality
	~~~

	- `--detect`: Specify what object type to detect (using natural language)
	```bash
	python main.py --detect person # Detect people
	python main.py --detect "red car" # Detect red cars
	python main.py --detect "person wearing a hat" # Detect people with hats
	```

	- `--box-style`: Choose visualization style
	```bash
	python main.py --box-style censor # Black boxes (default)
	python main.py --box-style yolo # YOLO-style boxes with labels
	python main.py --box-style hitmarker # COD-style hitmarkers
	```

	- `--rows` and `--cols`: Enable grid-based detection by splitting frames
	~~~bash
	python main.py --rows 2 --cols 2 # Split each frame into 2x2 grid
	python main.py --rows 3 --cols 3 # Split each frame into 3x3 grid
	```

	You can combine arguments:
	```bash
	python main.py --detect "person wearing sunglasses" --box-style yolo --test --preset "fast" --rows 2 --cols 2
	```

	### Visualization Styles

	The tool supports three different visualization styles for detected objects:

	1. Censor (default)
	- Places solid black rectangles over detected objects
	- Best for privacy and content moderation
	- Completely obscures the detected region

	2. YOLO
	- Traditional object detection style
	- Red bounding box around detected objects
	- Label showing object type above the box
	- Good for analysis and debugging

	3. Hitmarker
	- Call of Duty inspired visualization
	- White crosshair marker at center of detected objects
	- Small label above the marker
	- Stylistic choice for gaming-inspired visualization

	Choose the style that best fits your use case using the `--box-style` argument.

	## Output

	Processed videos will be saved in the `outputs` directory with the format:
	`[style]_[object_type]_[original_filename].mp4`

	For example:
	- `censor_face_video.mp4`
	- `yolo_person_video.mp4`
	- `hitmarker_car_video.mp4`

	The output videos will include:
	- Original video content
	- Selected visualization style for detected objects
	- Web-compatible H.264 encoding

	## Notes

	- Processing time depends on video length, grid size, and GPU availability
	- GPU is strongly recommended for faster processing
	- Requires sufficient disk space for temporary files
	- Detection quality may vary based on object type and video quality
	- Detection accuracy depends on Moondream2's ability to recognize the specified object type
	- Grid-based detection should only be used when necessary due to significant performance impact
	- Web interface provides real-time progress updates and error messages
	- Different visualization styles may be more suitable for different use cases
	- Moondream can detect almost anything you can describe in natural language