Text-to-Video
zjuJish commited on
Commit
c41043f
·
1 Parent(s): a760d9c
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +19 -2
  3. lora.pt +1 -1
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  assets/logo.png filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  assets/logo.png filter=lfs diff=lfs merge=lfs -text
37
+ lora.pt filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -43,6 +43,8 @@ pipeline_tag: text-to-video
43
  <!-- **Note:** This open-source repository is intended to provide a reference implementation. Due to the difference in the underlying I2V model's performance, the open-source version may not achieve the same performance as the model in our paper. -->
44
 
45
  ## 🔥 Updates
 
 
46
  - __[2025.12.14]__: Training and inference code, [model checkpoints](https://huggingface.co/KlingTeam/MemFlow) are available.
47
  <!-- - __[2025.09.25]__: [CamCloneMaster](https://arxiv.org/abs/2506.03140) has been accepted by SIGGRAPH Aisa 2025. -->
48
  <!-- - __[2025.09.08]__: [CameraClone Dataset](https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset/) is avaliable. -->
@@ -151,20 +153,35 @@ Download Wan2.1-T2V-14B as the teacher model.
151
  ```
152
  huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir wan_models/Wan2.1-T2V-14B
153
  ```
154
-
155
  **Stage 1: Self-Forcing Initialization for Memory Mechanism**
 
 
156
  ```
157
  bash train_init.sh
158
  ```
 
 
 
 
159
  **Stage 2: Streaming Long Tuning**
 
160
  ```
161
  bash train_long.sh
162
  ```
163
-
 
 
 
164
  **Hints for two stage training**
165
 
166
  The `bank_size` is a tunable hyperparameter specified in [`configs/train_init.yaml`](configs/train_init.yaml) and [`configs/train_long.yaml`](configs/train_long.yaml). It controls the number of latent frames stored in the memory bank. When `bank_size` matches the number of latent frames of frame sink in [LongLive](https://github.com/NVlabs/LongLive) (as in our default setting), training can optionally start directly from Stage 2 (Streaming Long Tuning). Specifically, we initialize from the checkpoint [`longlive_base.pt`](https://huggingface.co/Efficient-Large-Model/LongLive-1.3B/blob/main/models/longlive_base.pt) obtained in Stage 1 of [LongLive](https://github.com/NVlabs/LongLive) and fine-tune only the LoRA parameters, which significantly improves training efficiency.
167
 
 
 
 
 
 
168
 
169
  <!-- ## How to contribute
170
  - Make sure to have git installed.
 
43
  <!-- **Note:** This open-source repository is intended to provide a reference implementation. Due to the difference in the underlying I2V model's performance, the open-source version may not achieve the same performance as the model in our paper. -->
44
 
45
  ## 🔥 Updates
46
+ - __[2025.12.29]__: We release multi-node distributed training scripts for both initialization and streaming long tuning ([`train_init_multinode.sh`](train_init_multinode.sh), [`train_long_multinode.sh`](train_long_multinode.sh)).
47
+ - __[2025.12.24]__: We release the benchmark of multi-prompt generation in our paper, a prompt set [`prompts/interactive_benchmark.jsonl`](prompts/interactive_benchmark.jsonl) consisting of 100 groups of narrative scripts, with each consisting of 6 successive 10-second prompts for a total of 100 videos lasting for 60 seconds.
48
  - __[2025.12.14]__: Training and inference code, [model checkpoints](https://huggingface.co/KlingTeam/MemFlow) are available.
49
  <!-- - __[2025.09.25]__: [CamCloneMaster](https://arxiv.org/abs/2506.03140) has been accepted by SIGGRAPH Aisa 2025. -->
50
  <!-- - __[2025.09.08]__: [CameraClone Dataset](https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset/) is avaliable. -->
 
153
  ```
154
  huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir wan_models/Wan2.1-T2V-14B
155
  ```
156
+
157
  **Stage 1: Self-Forcing Initialization for Memory Mechanism**
158
+
159
+ Single-node training:
160
  ```
161
  bash train_init.sh
162
  ```
163
+ Multi-node distributed training:
164
+ ```
165
+ bash train_init_multinode.sh
166
+ ```
167
  **Stage 2: Streaming Long Tuning**
168
+ Single-node training:
169
  ```
170
  bash train_long.sh
171
  ```
172
+ Multi-node distributed training:
173
+ ```
174
+ bash train_long_multinode.sh
175
+ ```
176
  **Hints for two stage training**
177
 
178
  The `bank_size` is a tunable hyperparameter specified in [`configs/train_init.yaml`](configs/train_init.yaml) and [`configs/train_long.yaml`](configs/train_long.yaml). It controls the number of latent frames stored in the memory bank. When `bank_size` matches the number of latent frames of frame sink in [LongLive](https://github.com/NVlabs/LongLive) (as in our default setting), training can optionally start directly from Stage 2 (Streaming Long Tuning). Specifically, we initialize from the checkpoint [`longlive_base.pt`](https://huggingface.co/Efficient-Large-Model/LongLive-1.3B/blob/main/models/longlive_base.pt) obtained in Stage 1 of [LongLive](https://github.com/NVlabs/LongLive) and fine-tune only the LoRA parameters, which significantly improves training efficiency.
179
 
180
+ ## 📏 Evaluation & Benchmark
181
+
182
+ We provide a evaluation prompt set as benchmark for multi-prompt generation. Following [LongLive](https://github.com/NVlabs/LongLive), we customize 100 groups of narrative scripts, with each consisting of 6 successive 10-second prompts for a total of 100 videos
183
+ lasting for 60 seconds. Set `data_path` in [`configs/interactive_inference.yaml`](configs/interactive_inference.yaml) as `prompts/interactive_benchmark.jsonl` for evaluation.
184
+
185
 
186
  <!-- ## How to contribute
187
  - Make sure to have git installed.
lora.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:114cace42a4bd47aff594906e049750b47ea23268b1cf76eb381860663bda865
3
  size 2800056690
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1dec690a21efbf4f1941f74337970696345449b9973035517a742f075e5f781a
3
  size 2800056690