File size: 3,545 Bytes
a3081b0
879f484
a3081b0
879f484
 
a3081b0
879f484
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a3081b0
879f484
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
title: stableVideo
app_file: app.py
sdk: gradio
sdk_version: 3.40.1
---
# StableVideo

[![](http://img.shields.io/badge/cs.CV-arXiv%3A2308.09592-B31B1B.svg)](https://arxiv.org/abs/2308.09592)
[![](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-orange)](https://huggingface.co/spaces/Reself/StableVideo)

> **StableVideo: Text-driven Consistency-aware Diffusion Video Editing**  
> Wenhao Chai, Xun Guoβœ‰οΈ, Gaoang Wang, Yan Lu  
> _ICCV 2023_

https://github.com/rese1f/StableVideo/assets/58205475/558555f1-711c-46f0-85bc-9c229ff1f511

https://github.com/rese1f/StableVideo/assets/58205475/c152d0fa-16d3-4528-b9c2-ad2ec53944b9

https://github.com/rese1f/StableVideo/assets/58205475/0edbefdd-9b5f-4868-842c-9bf3156a54d3


## VRAM requirement
|   |VRAM (MiB)|
|---|---|
|float32|29145|
|amp|23005|
|amp + cpu|17639|
|amp + cpu + xformers|14185|

- cpu: use cpu cache, args: `save_memory`

under default setting (*e.g.* resolution, *etc.*) in `app.py`

## Installation
```
git clone https://github.com/rese1f/StableVideo.git
conda create -n stablevideo python=3.11
pip install -r requirements.txt
(optional) pip install xformers 
```

(optional) We also provide CPU only version [huggingface demo](https://huggingface.co/spaces/Reself/StableVideo).
```
git lfs install
git clone https://huggingface.co/spaces/Reself/StableVideo
pip install -r requirements.txt
```

## Download Pretrained Model

All models and detectors can be downloaded from ControlNet Hugging Face page at [Download Link](https://huggingface.co/lllyasviel/ControlNet).


## Download example videos
Download the example atlas for car-turn, boat, libby, blackswa, bear, bicycle_tali, giraffe, kite-surf, lucia and motorbike at [Download Link](https://www.dropbox.com/s/oiyhbiqdws2p6r1/nla_share.zip?dl=0) shared by [Text2LIVE](https://github.com/omerbt/Text2LIVE) authors.

You can also train on your own video following [NLA](https://github.com/ykasten/layered-neural-atlases).

And it will create a folder data:
```
StableVideo
β”œβ”€β”€ ...
β”œβ”€β”€ ckpt
β”‚   β”œβ”€β”€ cldm_v15.yaml
|   β”œβ”€β”€ dpt_hybrid-midas-501f0c75.pt
β”‚   β”œβ”€β”€ control_sd15_canny.pth
β”‚   └── control_sd15_depth.pth
β”œβ”€β”€ data
β”‚   └── car-turn
β”‚       β”œβ”€β”€ checkpoint # NLA models are stored here
β”‚       β”œβ”€β”€ car-turn # contains video frames
β”‚       β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ blackswan
β”‚   β”œβ”€β”€ ...
└── ...
```

## Run and Play!
Run the following command to start.
```
python app.py
```
the result `.mp4` video and keyframe will be stored in the directory `./log` after clicking `render` button.

You can also edit the mask region for the foreground atlas as follows. Currently there might be a bug in Gradio. Please carefully check if the `editable output foreground atlas block` looks the same as the one above. If not, try to restart the entire program.

<img width="916" alt="" src="https://github.com/rese1f/StableVideo/assets/58205475/ec8dd9f0-84fb-43ca-baaa-fb6c58da0d77">


## Citation
If our work is useful for your research, please consider citing as below. Many thanks :)
```
@article{chai2023stablevideo,
  title={StableVideo: Text-driven Consistency-aware Diffusion Video Editing},
  author={Chai, Wenhao and Guo, Xun and Wang, Gaoang and Lu, Yan},
  journal={arXiv preprint arXiv:2308.09592},
  year={2023}
}
```

## Acknowledgement

This implementation is built partly on [Text2LIVE](https://github.com/omerbt/Text2LIVE) and [ControlNet](https://github.com/lllyasviel/ControlNet).

<!-- ## Citation -->