update
Browse files- .gitattributes +1 -0
- README.md +19 -2
- lora.pt +1 -1
.gitattributes
CHANGED
|
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
assets/logo.png filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
assets/logo.png filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
lora.pt filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -43,6 +43,8 @@ pipeline_tag: text-to-video
|
|
| 43 |
<!-- **Note:** This open-source repository is intended to provide a reference implementation. Due to the difference in the underlying I2V model's performance, the open-source version may not achieve the same performance as the model in our paper. -->
|
| 44 |
|
| 45 |
## 🔥 Updates
|
|
|
|
|
|
|
| 46 |
- __[2025.12.14]__: Training and inference code, [model checkpoints](https://huggingface.co/KlingTeam/MemFlow) are available.
|
| 47 |
<!-- - __[2025.09.25]__: [CamCloneMaster](https://arxiv.org/abs/2506.03140) has been accepted by SIGGRAPH Aisa 2025. -->
|
| 48 |
<!-- - __[2025.09.08]__: [CameraClone Dataset](https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset/) is avaliable. -->
|
|
@@ -151,20 +153,35 @@ Download Wan2.1-T2V-14B as the teacher model.
|
|
| 151 |
```
|
| 152 |
huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir wan_models/Wan2.1-T2V-14B
|
| 153 |
```
|
| 154 |
-
|
| 155 |
**Stage 1: Self-Forcing Initialization for Memory Mechanism**
|
|
|
|
|
|
|
| 156 |
```
|
| 157 |
bash train_init.sh
|
| 158 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
| 159 |
**Stage 2: Streaming Long Tuning**
|
|
|
|
| 160 |
```
|
| 161 |
bash train_long.sh
|
| 162 |
```
|
| 163 |
-
|
|
|
|
|
|
|
|
|
|
| 164 |
**Hints for two stage training**
|
| 165 |
|
| 166 |
The `bank_size` is a tunable hyperparameter specified in [`configs/train_init.yaml`](configs/train_init.yaml) and [`configs/train_long.yaml`](configs/train_long.yaml). It controls the number of latent frames stored in the memory bank. When `bank_size` matches the number of latent frames of frame sink in [LongLive](https://github.com/NVlabs/LongLive) (as in our default setting), training can optionally start directly from Stage 2 (Streaming Long Tuning). Specifically, we initialize from the checkpoint [`longlive_base.pt`](https://huggingface.co/Efficient-Large-Model/LongLive-1.3B/blob/main/models/longlive_base.pt) obtained in Stage 1 of [LongLive](https://github.com/NVlabs/LongLive) and fine-tune only the LoRA parameters, which significantly improves training efficiency.
|
| 167 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
|
| 169 |
<!-- ## How to contribute
|
| 170 |
- Make sure to have git installed.
|
|
|
|
| 43 |
<!-- **Note:** This open-source repository is intended to provide a reference implementation. Due to the difference in the underlying I2V model's performance, the open-source version may not achieve the same performance as the model in our paper. -->
|
| 44 |
|
| 45 |
## 🔥 Updates
|
| 46 |
+
- __[2025.12.29]__: We release multi-node distributed training scripts for both initialization and streaming long tuning ([`train_init_multinode.sh`](train_init_multinode.sh), [`train_long_multinode.sh`](train_long_multinode.sh)).
|
| 47 |
+
- __[2025.12.24]__: We release the benchmark of multi-prompt generation in our paper, a prompt set [`prompts/interactive_benchmark.jsonl`](prompts/interactive_benchmark.jsonl) consisting of 100 groups of narrative scripts, with each consisting of 6 successive 10-second prompts for a total of 100 videos lasting for 60 seconds.
|
| 48 |
- __[2025.12.14]__: Training and inference code, [model checkpoints](https://huggingface.co/KlingTeam/MemFlow) are available.
|
| 49 |
<!-- - __[2025.09.25]__: [CamCloneMaster](https://arxiv.org/abs/2506.03140) has been accepted by SIGGRAPH Aisa 2025. -->
|
| 50 |
<!-- - __[2025.09.08]__: [CameraClone Dataset](https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset/) is avaliable. -->
|
|
|
|
| 153 |
```
|
| 154 |
huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir wan_models/Wan2.1-T2V-14B
|
| 155 |
```
|
| 156 |
+
|
| 157 |
**Stage 1: Self-Forcing Initialization for Memory Mechanism**
|
| 158 |
+
|
| 159 |
+
Single-node training:
|
| 160 |
```
|
| 161 |
bash train_init.sh
|
| 162 |
```
|
| 163 |
+
Multi-node distributed training:
|
| 164 |
+
```
|
| 165 |
+
bash train_init_multinode.sh
|
| 166 |
+
```
|
| 167 |
**Stage 2: Streaming Long Tuning**
|
| 168 |
+
Single-node training:
|
| 169 |
```
|
| 170 |
bash train_long.sh
|
| 171 |
```
|
| 172 |
+
Multi-node distributed training:
|
| 173 |
+
```
|
| 174 |
+
bash train_long_multinode.sh
|
| 175 |
+
```
|
| 176 |
**Hints for two stage training**
|
| 177 |
|
| 178 |
The `bank_size` is a tunable hyperparameter specified in [`configs/train_init.yaml`](configs/train_init.yaml) and [`configs/train_long.yaml`](configs/train_long.yaml). It controls the number of latent frames stored in the memory bank. When `bank_size` matches the number of latent frames of frame sink in [LongLive](https://github.com/NVlabs/LongLive) (as in our default setting), training can optionally start directly from Stage 2 (Streaming Long Tuning). Specifically, we initialize from the checkpoint [`longlive_base.pt`](https://huggingface.co/Efficient-Large-Model/LongLive-1.3B/blob/main/models/longlive_base.pt) obtained in Stage 1 of [LongLive](https://github.com/NVlabs/LongLive) and fine-tune only the LoRA parameters, which significantly improves training efficiency.
|
| 179 |
|
| 180 |
+
## 📏 Evaluation & Benchmark
|
| 181 |
+
|
| 182 |
+
We provide a evaluation prompt set as benchmark for multi-prompt generation. Following [LongLive](https://github.com/NVlabs/LongLive), we customize 100 groups of narrative scripts, with each consisting of 6 successive 10-second prompts for a total of 100 videos
|
| 183 |
+
lasting for 60 seconds. Set `data_path` in [`configs/interactive_inference.yaml`](configs/interactive_inference.yaml) as `prompts/interactive_benchmark.jsonl` for evaluation.
|
| 184 |
+
|
| 185 |
|
| 186 |
<!-- ## How to contribute
|
| 187 |
- Make sure to have git installed.
|
lora.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 2800056690
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1dec690a21efbf4f1941f74337970696345449b9973035517a742f075e5f781a
|
| 3 |
size 2800056690
|