TencentARC
/

VideoPainter

@@ -25,10 +25,18 @@ Keywords: Video Inpainting, Video Editing, Video Generation
 <p align="center">
-<a href='https://yxbian23.github.io/project/video-painter'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href="https://arxiv.org/abs/2503.05639"><img src="https://img.shields.io/badge/arXiv-2503.05639-b31b1b.svg"></a> <a href="https://youtu.be/HYzNfsD3A0s"><img src="https://img.shields.io/badge/YouTube-Video-red?logo=youtube"></a> <a href="https://github.com/TencentARC/VideoPainter"><img src="https://img.shields.io/badge/GitHub-Code-black?logo=github"></a> <a href='https://huggingface.co/datasets/TencentARC/VPData'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-blue'></a> <a href='https://huggingface.co/datasets/TencentARC/VPBench'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Benchmark-blue'></a> <a href="https://huggingface.co/TencentARC/VideoPainter"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue"></a>
 </p>
-**Your like and star mean a lot for us to develop this project!** ❤️
 **📖 Table of Contents**
@@ -36,10 +44,15 @@ Keywords: Video Inpainting, Video Editing, Video Generation
 - [VideoPainter](#videopainter)
   - [🔥 Update Log](#-update-log)
-  - [TODO](#todo)
   - [🛠️ Method Overview](#️-method-overview)
   - [🚀 Getting Started](#-getting-started)
   - [🏃🏼 Running Scripts](#-running-scripts)
   - [🤝🏼 Cite Us](#-cite-us)
   - [💖 Acknowledgement](#-acknowledgement)
@@ -48,11 +61,13 @@ Keywords: Video Inpainting, Video Editing, Video Generation
 ## 🔥 Update Log
 - [2025/3/09] 📢 📢  [VideoPainter](https://huggingface.co/TencentARC/VideoPainter) are released, an efficient, any-length video inpainting & editing framework with plug-and-play context control.
 - [2025/3/09] 📢 📢  [VPData](https://huggingface.co/datasets/TencentARC/VPData) and [VPBench](https://huggingface.co/datasets/TencentARC/VPBench) are released, the largest video inpainting dataset with precise segmentation masks and dense video captions (>390K clips).
 ## TODO
 - [x] Release trainig and inference code
-- [x] Release evluation code
 - [x] Release [VideoPainter checkpoints](https://huggingface.co/TencentARC/VideoPainter) (based on CogVideoX-5B)
 - [x] Release [VPData and VPBench](https://huggingface.co/collections/TencentARC/videopainter-67cc49c6146a48a2ba93d159) for large-scale training and evaluation.
 - [x] Release gradio demo
@@ -107,10 +122,7 @@ pip install -e .
 </details>
 <details>
-<summary><b>Data Download ⬇️</b></summary>
-**VPBench and VPData**
 You can download the VPBench [here](https://huggingface.co/datasets/TencentARC/VPBench), and the VPData [here](https://huggingface.co/datasets/TencentARC/VPData) (as well as the Davis we re-processed), which are used for training and testing the BrushNet. By downloading the data, you are agreeing to the terms and conditions of the license. The data structure should be like:
@@ -172,11 +184,16 @@ You can download the VPData (only mask and text annotations due to the space lim
 git lfs install
 git clone https://huggingface.co/datasets/TencentARC/VPData
 mv VPBench data
-cd data
-unzip video_inpainting.zip
 ```
-Noted: *Due to the space limit, you need to run the following script to download the raw videos of the complete VPData. The format should be consistent with VPData/VPBench above (After download the VPData/VPBench, the script will automatically place the raw videos of VPData into the corresponding dataset directories that have been created by VPBench).*
 ```
 cd data_utils
@@ -216,6 +233,13 @@ git clone https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev
 mv ckpt/FLUX.1-Fill-dev ckpt/flux_inp
 ```
 The ckpt structure should be like:
@@ -237,6 +261,7 @@ The ckpt structure should be like:
         |-- transformer
         |-- vae
         |-- ...
 ```
 </details>

 <p align="center">
+<a href='https://yxbian23.github.io/project/video-painter'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp;
+<a href="https://arxiv.org/abs/2503.05639"><img src="https://img.shields.io/badge/arXiv-2503.05639-b31b1b.svg"></a> &nbsp;
+<a href="https://github.com/TencentARC/VideoPainter"><img src="https://img.shields.io/badge/GitHub-Code-black?logo=github"></a> &nbsp;
+<a href="https://youtu.be/HYzNfsD3A0s"><img src="https://img.shields.io/badge/YouTube-Video-red?logo=youtube"></a> &nbsp;
+<a href='https://huggingface.co/datasets/TencentARC/VPData'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-blue'></a> &nbsp;
+<a href='https://huggingface.co/datasets/TencentARC/VPBench'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Benchmark-blue'></a> &nbsp;
+<a href="https://huggingface.co/TencentARC/VideoPainter"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue"></a>
 </p>
+**Your star means a lot for us to develop this project!** ⭐⭐⭐
+**VPData and VPBench have been fully uploaded (contain 390K mask sequences and video captions). Welcome to use our biggest video segmentation dataset VPData with video captions!** 🔥🔥🔥
 **📖 Table of Contents**
 - [VideoPainter](#videopainter)
   - [🔥 Update Log](#-update-log)
+  - [📌 TODO](#todo)
   - [🛠️ Method Overview](#️-method-overview)
   - [🚀 Getting Started](#-getting-started)
+    - [Environment Requirement 🌍](#environment-requirement-)
+    - [Data Download ⬇️](#data-download-️)
   - [🏃🏼 Running Scripts](#-running-scripts)
+    - [Training 🤯](#training-)
+    - [Inference 📜](#inference-)
+    - [Evaluation 📏](#evaluation-)
   - [🤝🏼 Cite Us](#-cite-us)
   - [💖 Acknowledgement](#-acknowledgement)
 ## 🔥 Update Log
 - [2025/3/09] 📢 📢  [VideoPainter](https://huggingface.co/TencentARC/VideoPainter) are released, an efficient, any-length video inpainting & editing framework with plug-and-play context control.
 - [2025/3/09] 📢 📢  [VPData](https://huggingface.co/datasets/TencentARC/VPData) and [VPBench](https://huggingface.co/datasets/TencentARC/VPBench) are released, the largest video inpainting dataset with precise segmentation masks and dense video captions (>390K clips).
+- [2025/3/25] 📢 📢  The 390K+ high-quality video segmentation masks of [VPData](https://huggingface.co/datasets/TencentARC/VPData) have been fully released.
+- [2025/3/25] 📢 📢  The raw videos of videovo subset have been uploaded to [VPData](https://huggingface.co/datasets/TencentARC/VPData), to solve the raw video link expiration issue.
 ## TODO
 - [x] Release trainig and inference code
+- [x] Release evaluation code
 - [x] Release [VideoPainter checkpoints](https://huggingface.co/TencentARC/VideoPainter) (based on CogVideoX-5B)
 - [x] Release [VPData and VPBench](https://huggingface.co/collections/TencentARC/videopainter-67cc49c6146a48a2ba93d159) for large-scale training and evaluation.
 - [x] Release gradio demo
 </details>
 <details>
+<summary><b>VPBench and VPData Download ⬇️</b></summary>
 You can download the VPBench [here](https://huggingface.co/datasets/TencentARC/VPBench), and the VPData [here](https://huggingface.co/datasets/TencentARC/VPData) (as well as the Davis we re-processed), which are used for training and testing the BrushNet. By downloading the data, you are agreeing to the terms and conditions of the license. The data structure should be like:
 git lfs install
 git clone https://huggingface.co/datasets/TencentARC/VPData
 mv VPBench data
+# 1. unzip the masks in VPData
+python data_utils/unzip_folder.py --source_dir ./data/videovo_masks --target_dir ./data/video_inpainting/videovo
+python data_utils/unzip_folder.py --source_dir ./data/pexels_masks --target_dir ./data/video_inpainting/pexels
+# 2. unzip the raw videos in Videovo subset in VPData
+python data_utils/unzip_folder.py --source_dir ./data/videovo_raw_videos --target_dir ./data/videovo/raw_video
 ```
+Noted: *Due to the space limit, you need to run the following script to download the raw videos of the Pexels subset in VPData. The format should be consistent with VPData/VPBench above (After download the VPData/VPBench, the script will automatically place the raw videos of VPData into the corresponding dataset directories that have been created by VPBench).*
 ```
 cd data_utils
 mv ckpt/FLUX.1-Fill-dev ckpt/flux_inp
 ```
+[Optional]You need to download [SAM2](https://huggingface.co/facebook/sam2-hiera-large) for video segmentation in gradio demo:
+```
+git lfs install
+cd ckpt
+wget https://huggingface.co/facebook/sam2-hiera-large/resolve/main/sam2_hiera_large.pt
+```
+You can also choose the segmentation checkpoints of other sizes to balance efficiency and performance, such as [SAM2-Tiny](https://huggingface.co/facebook/sam2-hiera-tiny).
 The ckpt structure should be like:
         |-- transformer
         |-- vae
         |-- ...
+    |-- sam2_hiera_large.pt
 ```
 </details>