bwang3579 commited on
Commit
575cb78
·
verified ·
1 Parent(s): 6db8f07

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -27
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  license: mit
3
  library_name: diffusers
4
- pipeline_tag: text-to-video
5
  ---
6
 
7
  <p align="center">
@@ -20,31 +20,102 @@ pipeline_tag: text-to-video
20
  </div>
21
 
22
  ## 🔥🔥🔥 News!!
23
- * Feb 17, 2025: 👋 We release the inference code and model weights of Step-Video-T2V. [Download](https://huggingface.co/stepfun-ai/stepvideo-t2v)
24
- * Feb 17, 2025: 👋 We release the inference code and model weights of Step-Video-T2V-Turbo. [Download](https://huggingface.co/stepfun-ai/stepvideo-t2v-turbo)
25
- * Feb 17, 2025: 🎉 We have made our technical report available as open source. [Read](https://arxiv.org/abs/2502.10248)
26
 
27
- ## Video Demos
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  <table border="0" style="width: 100%; text-align: center; margin-top: 1px;">
30
  <tr>
31
- <td><video src="https://github.com/user-attachments/assets/9274b351-595d-41fb-aba3-f58e6e91603a" width="100%" controls autoplay loop muted></video></td>
32
- <td><video src="https://github.com/user-attachments/assets/2f6b3ad5-e93b-436b-98bc-4701182d8652" width="100%" controls autoplay loop muted></video></td>
33
- <td><video src="https://github.com/user-attachments/assets/67d20ee7-ad78-4b8f-80f6-3fdb00fb52d8" width="100%" controls autoplay loop muted></video></td>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  </tr>
 
 
 
 
 
 
 
 
35
  <tr>
36
- <td><video src="https://github.com/user-attachments/assets/9abce409-105d-4a8a-ad13-104a98cc8a0b" width="100%" controls autoplay loop muted></video></td>
37
- <td><video src="https://github.com/user-attachments/assets/8d1e1a47-048a-49ce-85f6-9d013f2d8e89" width="100%" controls autoplay loop muted></video></td>
38
- <td><video src="https://github.com/user-attachments/assets/32cf4bd1-ec1f-4f77-a488-cd0284aa81bb" width="100%" controls autoplay loop muted></video></td>
39
  </tr>
40
  <tr>
41
- <td><video src="https://github.com/user-attachments/assets/f95a7a49-032a-44ea-a10f-553d4e5d21c6" width="100%" controls autoplay loop muted></video></td>
42
- <td><video src="https://github.com/user-attachments/assets/3534072e-87d9-4128-a87f-28fcb5d951e0" width="100%" controls autoplay loop muted></video></td>
43
- <td><video src="https://github.com/user-attachments/assets/6d893dad-556d-4527-a882-666cba3d10e9" width="100%" controls autoplay loop muted></video></td>
44
  </tr>
 
45
 
 
 
 
 
 
 
 
 
 
 
 
46
  </table>
47
 
 
48
  ## Table of Contents
49
 
50
  1. [Introduction](#1-introduction)
@@ -116,15 +187,16 @@ The following table shows the requirements for running Step-Video-T2V model (bat
116
  - Python >= 3.10.0 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html))
117
  - [PyTorch >= 2.3-cu121](https://pytorch.org/)
118
  - [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads)
119
- - [FFmpeg](https://www.ffmpeg.org/)
 
120
  ```bash
121
- git clone https://github.com/stepfun-ai/Step-Video-T2V.git
122
  conda create -n stepvideo python=3.10
123
  conda activate stepvideo
124
 
125
- cd Step-Video-T2V
126
  pip install -e .
127
- pip install flash-attn --no-build-isolation ## flash-attn is optional
128
  ```
129
 
130
  ### 🚀 4.3 Inference Scripts
@@ -136,7 +208,18 @@ parallel=4 # or parallel=8
136
  url='127.0.0.1'
137
  model_dir=where_you_download_dir
138
 
139
- torchrun --nproc_per_node $parallel run_parallel.py --model_dir $model_dir --vae_url $url --caption_url $url --ulysses_degree $parallel --prompt "一名宇航员在月球上发现一块石碑,上面印有“stepfun”字样,闪闪发光" --infer_steps 50 --cfg_scale 9.0 --time_shift 13.0
 
 
 
 
 
 
 
 
 
 
 
140
  ```
141
 
142
  ### 🚀 4.4 Best-of-Practice Inference settings
@@ -156,14 +239,7 @@ The online version of Step-Video-T2V is available on [跃问视频](https://yuew
156
 
157
  ## 7. Citation
158
  ```
159
- @misc{ma2025stepvideot2vtechnicalreportpractice,
160
- title={Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model},
161
- author={Guoqing Ma and Haoyang Huang and Kun Yan and Liangyu Chen and Nan Duan and Shengming Yin and Changyi Wan and Ranchen Ming and Xiaoniu Song and Xing Chen and Yu Zhou and Deshan Sun and Deyu Zhou and Jian Zhou and Kaijun Tan and Kang An and Mei Chen and Wei Ji and Qiling Wu and Wen Sun and Xin Han and Yanan Wei and Zheng Ge and Aojie Li and Bin Wang and Bizhu Huang and Bo Wang and Brian Li and Changxing Miao and Chen Xu and Chenfei Wu and Chenguang Yu and Dapeng Shi and Dingyuan Hu and Enle Liu and Gang Yu and Ge Yang and Guanzhe Huang and Gulin Yan and Haiyang Feng and Hao Nie and Haonan Jia and Hanpeng Hu and Hanqi Chen and Haolong Yan and Heng Wang and Hongcheng Guo and Huilin Xiong and Huixin Xiong and Jiahao Gong and Jianchang Wu and Jiaoren Wu and Jie Wu and Jie Yang and Jiashuai Liu and Jiashuo Li and Jingyang Zhang and Junjing Guo and Junzhe Lin and Kaixiang Li and Lei Liu and Lei Xia and Liang Zhao and Liguo Tan and Liwen Huang and Liying Shi and Ming Li and Mingliang Li and Muhua Cheng and Na Wang and Qiaohui Chen and Qinglin He and Qiuyan Liang and Quan Sun and Ran Sun and Rui Wang and Shaoliang Pang and Shiliang Yang and Sitong Liu and Siqi Liu and Shuli Gao and Tiancheng Cao and Tianyu Wang and Weipeng Ming and Wenqing He and Xu Zhao and Xuelin Zhang and Xianfang Zeng and Xiaojia Liu and Xuan Yang and Yaqi Dai and Yanbo Yu and Yang Li and Yineng Deng and Yingming Wang and Yilei Wang and Yuanwei Lu and Yu Chen and Yu Luo and Yuchu Luo and Yuhe Yin and Yuheng Feng and Yuxiang Yang and Zecheng Tang and Zekai Zhang and Zidong Yang and Binxing Jiao and Jiansheng Chen and Jing Li and Shuchang Zhou and Xiangyu Zhang and Xinhao Zhang and Yibo Zhu and Heung-Yeung Shum and Daxin Jiang},
162
- year={2025},
163
- eprint={2502.10248},
164
- archivePrefix={arXiv},
165
- primaryClass={cs.CV},
166
- url={https://arxiv.org/abs/2502.10248},
167
  }
168
  ```
169
 
 
1
  ---
2
  license: mit
3
  library_name: diffusers
4
+ pipeline_tag: image-to-video
5
  ---
6
 
7
  <p align="center">
 
20
  </div>
21
 
22
  ## 🔥🔥🔥 News!!
23
+ * Mar 17, 2025: 👋 We release the inference code and model weights of Step-Video-Ti2V. [Download](https://huggingface.co/stepfun-ai/stepvideo-ti2v)
24
+ * Mar 17, 2025: 🎉 We have made our technical report available as open source. [Read](https://arxiv.org/abs/2502.10248)
 
25
 
26
+
27
+ ### 🚀 Inference Scripts
28
+ - We employed a decoupling strategy for the text encoder, VAE decoding, and DiT to optimize GPU resource utilization by DiT. As a result, a dedicated GPU is needed to handle the API services for the text encoder's embeddings and VAE decoding.
29
+ ```bash
30
+ python api/call_remote_server.py --model_dir where_you_download_dir & ## We assume you have more than 4 GPUs available. This command will return the URL for both the caption API and the VAE API. Please use the returned URL in the following command.
31
+
32
+ parallel=4 # or parallel=8
33
+ url='127.0.0.1'
34
+ model_dir=where_you_download_dir
35
+
36
+ torchrun --nproc_per_node $parallel run_parallel.py \
37
+ --model_dir $model_dir \
38
+ --vae_url $url \
39
+ --caption_url $url \
40
+ --ulysses_degree $parallel \
41
+ --prompt "男孩笑起来" \
42
+ --first_image_path ./assets/demo.png \
43
+ --infer_steps 50 \
44
+ --save_path ./results \
45
+ --cfg_scale 9.0 \
46
+ --motion_score 5 \
47
+ --time_shift 12.573
48
+ ```
49
+ ## Motion Control
50
 
51
  <table border="0" style="width: 100%; text-align: center; margin-top: 1px;">
52
  <tr>
53
+ <td><video src="https://github.com/user-attachments/assets/3c6a5c8d-ada4-484f-8f3d-f2a99ef18a4b" width="30%" controls autoplay loop muted></video></td>
54
+ <td><video src="https://github.com/user-attachments/assets/90c608d9-b3cf-40fa-b4ee-21b682c840ae" width="30%" controls autoplay loop muted></video></td>
55
+ <td><video src="https://github.com/user-attachments/assets/e58d3a6b-0076-4587-aac5-6911ba4c776d" width="30%" controls autoplay loop muted></video></td>
56
+ </tr>
57
+ </table>
58
+
59
+ ## Motion Amplitude Control
60
+
61
+ <table border="0" style="width: 100%; text-align: center; margin-top: 10px;">
62
+ <tr>
63
+ <th style="width: 33%;">Motion = 2</th>
64
+ <th style="width: 33%;">Motion = 5</th>
65
+ <th style="width: 33%;">Motion = 10</th>
66
+ </tr>
67
+ <tr>
68
+ <td><video src="https://github.com/user-attachments/assets/0d6b1813-2bf0-462a-8ad4-c0583d83afc5" width="33%" controls autoplay loop muted></video></td>
69
+ <td><video src="https://github.com/user-attachments/assets/33699654-93cc-4205-8a47-93ece4282f72" width="33%" controls autoplay loop muted></video></td>
70
+ <td><video src="https://github.com/user-attachments/assets/52d73eb5-2c68-4de3-9019-516243804b2c" width="33%" controls autoplay loop muted></video></td>
71
+ </tr>
72
+ </table>
73
+
74
+ <table border="0" style="width: 100%; text-align: center; margin-top: 10px;">
75
+ <tr>
76
+ <th style="width: 33%;">Motion = 2</th>
77
+ <th style="width: 33%;">Motion = 5</th>
78
+ <th style="width: 33%;">Motion = 20</th>
79
+ </tr>
80
+ <tr>
81
+ <td><video src="https://github.com/user-attachments/assets/31c48385-fe83-4961-bd42-7bd2b1edeb19" width="33%" controls autoplay loop muted></video></td>
82
+ <td><video src="https://github.com/user-attachments/assets/913a407e-55ca-4a33-bafe-bd5e38eec5f5" width="33%" controls autoplay loop muted></video></td>
83
+ <td><video src="https://github.com/user-attachments/assets/119a3673-014f-4772-b846-718307a4a412" width="33%" controls autoplay loop muted></video></td>
84
  </tr>
85
+ </table>
86
+
87
+ 🎯 Tips
88
+ The default motion_score = 5 is suitable for general use. If you need more stability, set motion_score = 2, though it may be less responsive to certain movements. For greater movement flexibility, you can use motion_score = 10 or motion_score = 20 to enable more intense actions. Feel free to customize the motion_score based on your creative needs to fit different use cases.
89
+
90
+ ## Camera Control
91
+
92
+ <table border="0" style="width: 100%; text-align: center; margin-top: 1px;">
93
  <tr>
94
+ <th style="width: 33%;">镜头环绕</th>
95
+ <th style="width: 33%;">镜头推进</th>
96
+ <th style="width: 33%;">镜头拉远</th>
97
  </tr>
98
  <tr>
99
+ <td><video src="https://github.com/user-attachments/assets/257847bc-5967-45ba-a649-505859476aad" height="30%" controls autoplay loop muted></video></td>
100
+ <td><video src="https://github.com/user-attachments/assets/d310502a-4f7e-4a78-882f-95c46b4dfe67" height="30%" controls autoplay loop muted></video></td>
101
+ <td><video src="https://github.com/user-attachments/assets/f6426fc7-2a18-474c-9766-fc8ae8d8d40d" height="30%" controls autoplay loop muted></video></td>
102
  </tr>
103
+ </table>
104
 
105
+ <table border="0" style="width: 100%; text-align: center; margin-top: 1px;">
106
+ <tr>
107
+ <th style="width: 33%;">镜头固定</th>
108
+ <th style="width: 33%;">镜头左移</th>
109
+ <th style="width: 33%;">镜头右摇</th>
110
+ </tr>
111
+ <tr>
112
+ <td><video src="https://github.com/user-attachments/assets/f78f76a0-afe1-41b1-9914-f2f508c6ea50" width="30%" controls autoplay loop muted></video></td>
113
+ <td><video src="https://github.com/user-attachments/assets/3894ec0f-d483-41fe-8331-68b6e5bf6544" width="30%" controls autoplay loop muted></video></td>
114
+ <td><video src="https://github.com/user-attachments/assets/9de3aa20-c797-4dac-bef1-ee064ed96ed4" width="30%" controls autoplay loop muted></video></td>
115
+ </tr>
116
  </table>
117
 
118
+
119
  ## Table of Contents
120
 
121
  1. [Introduction](#1-introduction)
 
187
  - Python >= 3.10.0 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html))
188
  - [PyTorch >= 2.3-cu121](https://pytorch.org/)
189
  - [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads)
190
+ - [FFmpeg](https://www.ffmpeg.org/)
191
+ -
192
  ```bash
193
+ git clone https://github.com/stepfun-ai/Step-Video-TI2V.git
194
  conda create -n stepvideo python=3.10
195
  conda activate stepvideo
196
 
197
+ cd StepFun-StepVideo
198
  pip install -e .
199
+
200
  ```
201
 
202
  ### 🚀 4.3 Inference Scripts
 
208
  url='127.0.0.1'
209
  model_dir=where_you_download_dir
210
 
211
+ torchrun --nproc_per_node $parallel run_parallel.py \
212
+ --model_dir $model_dir \
213
+ --vae_url $url \
214
+ --caption_url $url \
215
+ --ulysses_degree $parallel \
216
+ --prompt "男孩笑起来" \
217
+ --first_image_path ./assets/demo.png \
218
+ --infer_steps 50 \
219
+ --save_path ./results \
220
+ --cfg_scale 9.0 \
221
+ --motion_score 5 \
222
+ --time_shift 12.573
223
  ```
224
 
225
  ### 🚀 4.4 Best-of-Practice Inference settings
 
239
 
240
  ## 7. Citation
241
  ```
242
+ @misc{
 
 
 
 
 
 
 
243
  }
244
  ```
245