xilanhua12138 commited on
Commit
6af2ed4
Β·
verified Β·
1 Parent(s): 3c3bd85
Files changed (2) hide show
  1. README.md +81 -36
  2. assets/logo.png +0 -0
README.md CHANGED
@@ -8,12 +8,16 @@ tags:
8
  - video generation
9
  library_name: diffusers
10
  ---
 
11
  # MoviiGen 1.1
12
- -----
13
 
14
- [**MoviiGen 1.1: Towards Cinematic-Quality Video Generative Models**]("") <be>
 
 
15
 
16
- In this repository, we present **MoviiGen 1.1**, a cutting-edge video generation model that excels in cinematic aesthetics and visual quality. This model is a fine-tuning model based on the **Wan2.1**. Based on comprehensive evaluations by 11 professional filmmakers and AIGC creators, including industry experts, across 60 aesthetic dimensions, **MoviiGen 1.1** demonstrates superior performance in key cinematic aspects:
 
 
17
 
18
  - πŸ‘ **Superior Cinematic Aesthetics**: **MoviiGen 1.1** outperforms competitors in three critical dimensions: atmosphere creation, camera movement, and object detail preservation, making it the preferred choice for professional cinematic applications.
19
  - πŸ‘ **Visual Coherence & Quality**: MoviiGen 1.1 excels in clarity (+14.6%) and realism (+4.3%), making it ideal for high-fidelity scenarios such as real-scene conversion and portrait detail. Wan2.1 stands out in smoothness and overall visual harmony, better suited for tasks emphasizing composition, coherence, and artistic style. Both models have close overall scores, so users can select MoviiGen 1.1 for clarity and realism, or Wan2.1 for style and structural consistency.
@@ -23,30 +27,26 @@ In this repository, we present **MoviiGen 1.1**, a cutting-edge video generation
23
 
24
  This repository features our latest model, which establishes new benchmarks in cinematic video generation. Through extensive evaluation by industry professionals, it has demonstrated exceptional capabilities in creating high-quality visuals with natural motion dynamics and consistent aesthetic quality, making it an ideal choice for professional video production and creative applications.
25
 
26
-
27
  ## Video Demos
28
 
29
- | <video width="320" controls><source src="assets/79_1920*1056_seed3732225395.mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="assets/150_1920*1056_seed1674457713.mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="assets/143_1920*1056_seed3114534932.mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="assets/94_1920*1056_seed3693446494.mp4" type="video/mp4">Your browser does not support the video tag.</video> |
30
  |--------|--------|--------|--------|
 
 
 
31
 
32
- | <video width="320" controls><source src="assets/23_1920*1056_seed3934691816.mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="assets/13_1920*1056..mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="assets/26_1920*1056..mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="assets/39_1920*1056..mp4" type="video/mp4">Your browser does not support the video tag.</video> |
33
- |--------|--------|--------|--------|
34
-
35
- | <video width="320" controls><source src="assets/100_1920*1056_seed2949593166.mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="assets/54_1920*1056..mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="assets/107_1920*1056_seed525896597.mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="assets/94_1920*1056_seed3693446494.mp4" type="video/mp4">Your browser does not support the video tag.</video> |
36
- |--------|--------|--------|--------|
37
 
38
  ## πŸ”₯ Latest News!!
39
 
40
- * Feb 22, 2025: πŸ‘‹ We've released the inference code and weights of Wan2.1.
41
 
42
-
43
- ## Quickstart
44
 
45
  #### Installation
46
  Clone the repo:
47
  ```
48
- git clone https://github.com/Wan-Video/Wan2.1.git
49
- cd Wan2.1
50
  ```
51
 
52
  Install dependencies:
@@ -55,46 +55,91 @@ Install dependencies:
55
  pip install -r requirements.txt
56
  ```
57
 
58
-
59
  #### Model Download
60
 
61
- | T2V-14B | πŸ€— [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B) | Supports both 720P and 1080P
62
-
63
- > πŸ’‘Note: The 14B model is capable of generating videos at 720P resolution. However, due to limited training at this resolution, the results are generally less stable compared to 480P. For optimal performance, we recommend using 480P resolution.
64
-
65
 
66
  Download models using huggingface-cli:
67
  ```
68
  pip install "huggingface_hub[cli]"
69
- huggingface-cli download Wan-AI/Wan2.1-T2V-14B --local-dir ./Wan2.1-T2V-14B
70
  ```
71
 
72
- #### Run Text-to-Video Generation
 
 
 
 
 
 
73
 
74
- This repository supports two Text-to-Video models (14B) and two resolutions (720P and 1080P). The parameters and configurations for these models are as follows:
 
 
 
 
75
 
 
 
 
 
 
 
 
76
  ```
77
- python generate.py --task t2v-14B --size 1920*1080 --ckpt_dir ./MoviiGen1.1 --prompt "Indoors, in a hospital hallway during the daytime, a middle-aged Caucasian female doctor with her brown hair in a bun, wearing a white doctor's coat over a grey shirt, a gold necklace, and stud earrings, with a blue lanyard and glasses around her neck, is conversing with a man whose back is to the camera (only his back and part of his profile are visible; he wears a dark jacket over a light-colored hooded sweatshirt). She alternates between looking down and looking up at him, her expression serious and concerned. The hallway background features beige walls and dark door frames, looking relatively empty, with natural light entering from a window at the end. The camera employs a static medium close-up shot, filmed from a slightly low angle, with the focus consistently on the doctor. The overall color palette is cool-toned, with a slightly greyish tint. Soft, primarily natural side lighting from the background window creates a calm and serious atmosphere. Visual characteristics include soft lighting, digital noise, high definition, a slightly blue tint, indoor lighting, modern aesthetic, and low saturation."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  ```
79
 
80
- ##### Runing local gradio
 
 
 
 
 
 
 
 
 
 
 
 
81
 
 
 
 
82
  ```
83
- cd gradio
84
- python t2v_14B_singleGPU.py --ckpt_dir ./MoviiGen1.1
 
 
 
85
  ```
86
 
 
87
  ## Manual Evaluation
88
 
89
  <div style="display: flex; justify-content: space-between;">
90
  <div style="flex: 1; margin-right: 10px;"><img src="assets/movie_asethetic.png" alt="Movie Aesthetic Evaluation" style="width: 100%;" /></div>
91
  <div style="flex: 1;"><img src="assets/visual_quality.png" alt="Movie Aesthetic Evaluation" style="width: 100%;" /></div>
92
- </div>
93
-
94
-
95
-
96
- ## Community Contributions
97
- - [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) provides more support for Wan, including video-to-video, FP8 quantization, VRAM optimization, LoRA training, and more. Please refer to [their examples](https://github.com/modelscope/DiffSynth-Studio/tree/main/examples/wanvideo).
98
-
99
-
100
-
 
8
  - video generation
9
  library_name: diffusers
10
  ---
11
+
12
  # MoviiGen 1.1
 
13
 
14
+ [![HuggingFace](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Model-blue)](https://huggingface.co/ZuluVision/MoviiGen1.1)
15
+ [![GitHub stars](https://img.shields.io/github/stars/ZulutionAI/MoviiGen1.1?style=social)](https://github.com/ZulutionAI/MoviiGen1.1/stargazers)
16
+
17
 
18
+ [**MoviiGen 1.1: Towards Cinematic-Quality Video Generative Models**]("https://huggingface.co/ZuluVision/MoviiGen1.1") <be>
19
+
20
+ In this repository, we present **MoviiGen 1.1**, a cutting-edge video generation model that excels in cinematic aesthetics and visual quality. This model is a fine-tuning model based on the Wan2.1. Based on comprehensive evaluations by 11 professional filmmakers and AIGC creators, including industry experts, across 60 aesthetic dimensions, **MoviiGen 1.1** demonstrates superior performance in key cinematic aspects:
21
 
22
  - πŸ‘ **Superior Cinematic Aesthetics**: **MoviiGen 1.1** outperforms competitors in three critical dimensions: atmosphere creation, camera movement, and object detail preservation, making it the preferred choice for professional cinematic applications.
23
  - πŸ‘ **Visual Coherence & Quality**: MoviiGen 1.1 excels in clarity (+14.6%) and realism (+4.3%), making it ideal for high-fidelity scenarios such as real-scene conversion and portrait detail. Wan2.1 stands out in smoothness and overall visual harmony, better suited for tasks emphasizing composition, coherence, and artistic style. Both models have close overall scores, so users can select MoviiGen 1.1 for clarity and realism, or Wan2.1 for style and structural consistency.
 
27
 
28
  This repository features our latest model, which establishes new benchmarks in cinematic video generation. Through extensive evaluation by industry professionals, it has demonstrated exceptional capabilities in creating high-quality visuals with natural motion dynamics and consistent aesthetic quality, making it an ideal choice for professional video production and creative applications.
29
 
 
30
  ## Video Demos
31
 
32
+ | <video width="320" controls><source src="https://huggingface.co/ZuluVision/MoviiGen1.1/resolve/main/assets/79_1920*1056_seed3732225395.mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="https://huggingface.co/ZuluVision/MoviiGen1.1/resolve/main/assets/150_1920*1056_seed1674457713.mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="https://huggingface.co/ZuluVision/MoviiGen1.1/resolve/main/assets/143_1920*1056_seed3114534932.mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="https://huggingface.co/ZuluVision/MoviiGen1.1/resolve/main/assets/94_1920*1056_seed3693446494.mp4" type="video/mp4">Your browser does not support the video tag.</video> |
33
  |--------|--------|--------|--------|
34
+ | <video width="320" controls><source src="https://huggingface.co/ZuluVision/MoviiGen1.1/resolve/main/assets/23_1920*1056_seed3934691816.mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="https://huggingface.co/ZuluVision/MoviiGen1.1/resolve/main/assets/13_1920*1056..mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="https://huggingface.co/ZuluVision/MoviiGen1.1/resolve/main/assets/26_1920*1056..mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="https://huggingface.co/ZuluVision/MoviiGen1.1/resolve/main/assets/39_1920*1056..mp4" type="video/mp4">Your browser does not support the video tag.</video> |
35
+ ||
36
+ | <video width="320" controls><source src="https://huggingface.co/ZuluVision/MoviiGen1.1/resolve/main/assets/100_1920*1056_seed2949593166.mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="https://huggingface.co/ZuluVision/MoviiGen1.1/resolve/main/assets/54_1920*1056..mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="https://huggingface.co/ZuluVision/MoviiGen1.1/resolve/main/assets/107_1920*1056_seed525896597.mp4" type="video/mp4">Your browser does not support the video tag.</video> | <video width="320" controls><source src="https://huggingface.co/ZuluVision/MoviiGen1.1/resolve/main/assets/94_1920*1056_seed3693446494.mp4" type="video/mp4">Your browser does not support the video tag.</video> |
37
 
 
 
 
 
 
38
 
39
  ## πŸ”₯ Latest News!!
40
 
41
+ * May 12, 2025: πŸ‘‹ We've released the inference code and **training code** and weights of MoviiGen1.1.
42
 
43
+ ## πŸ’‘ Quickstart
 
44
 
45
  #### Installation
46
  Clone the repo:
47
  ```
48
+ git clone https://github.com/ZulutionAI/MoviiGen1.1.git
49
+ cd MoviiGen1.1
50
  ```
51
 
52
  Install dependencies:
 
55
  pip install -r requirements.txt
56
  ```
57
 
 
58
  #### Model Download
59
 
60
+ T2V-14B Model: πŸ€— [Huggingface](https://huggingface.co/ZuluVision/MoviiGen1.1)
61
+ MoviiGen1.1 model supports both 720P and 1080P.
 
 
62
 
63
  Download models using huggingface-cli:
64
  ```
65
  pip install "huggingface_hub[cli]"
66
+ huggingface-cli download ZuluVision/MoviiGen1.1 --local-dir ./MoviiGen1.1
67
  ```
68
 
69
+ ## πŸ› οΈ Training
70
+
71
+ ### Training Framework
72
+
73
+ Our training framework is built on [FastVideo](https://github.com/hao-ai-lab/FastVideo), with custom implementation of sequence parallel to optimize memory usage and training efficiency. The sequence parallel approach allows us to distribute the computational load across multiple GPUs, enabling efficient training of large-scale video generation models.
74
+
75
+ #### Key Features:
76
 
77
+ - **Sequence Parallel & Ring Attention**: Our custom implementation divides the temporal dimension across multiple GPUs, reducing per-device memory requirements while maintaining model quality.
78
+ - **Efficient Data Loading**: Optimized data pipeline for handling high-resolution video frames (Latent Cache and Text Embedding Cache).
79
+ - **Multi Resolution Training Bucket**: Support for training at multiple resolutions.
80
+ - **Mixed Precision Training**: Support for BF16/FP16 training to accelerate computation.
81
+ - **Distributed Training**: Seamless multi-node, multi-GPU training support.
82
 
83
+ ### Data Preprocessing
84
+
85
+ We cache the videos and corresponding text prompts as latents and text embeddings to optimize the training process. This preprocessing step significantly improves training efficiency by reducing computational overhead during the training phase.
86
+
87
+ ```bash
88
+ cd scripts/data_preprocess
89
+ bash scripts/data_preprocess/preprocess.sh
90
  ```
91
+ Example Data Format:
92
+
93
+ training_data.json
94
+ ```json
95
+ [
96
+ {
97
+ "cap": "your prompt",
98
+ "path": "path/to/your/video.mp4",
99
+ "resolution": {
100
+ "width": 3840,
101
+ "height": 2160
102
+ },
103
+ "fps": 23.976023976023978,
104
+ "duration": 1.4180833333333331
105
+ },
106
+ ...
107
+ ]
108
+ ```
109
+ merge.txt:
110
+ ```txt
111
+ relative_path_to_json_dir, training_data.json
112
  ```
113
 
114
+ Output Json:
115
+
116
+ video_caption.json
117
+ ```json
118
+ [
119
+ {
120
+ "latent_path": "path/to/your/latent.pt",
121
+ "prompt_embed_path": "path/to/your/prompt_embed.pt",
122
+ "length": 12
123
+ },
124
+ ...
125
+ ]
126
+ ```
127
 
128
+ ### Train
129
+ ```bash
130
+ bash scripts/train/finetune.sh
131
  ```
132
+
133
+ **When multi-node training, you need to set the number of nodes and the number of processes per node manually.** We provide a sample script for multi-node training.
134
+
135
+ ```bash
136
+ bash scripts/bash/finetune_multi_node.sh
137
  ```
138
 
139
+
140
  ## Manual Evaluation
141
 
142
  <div style="display: flex; justify-content: space-between;">
143
  <div style="flex: 1; margin-right: 10px;"><img src="assets/movie_asethetic.png" alt="Movie Aesthetic Evaluation" style="width: 100%;" /></div>
144
  <div style="flex: 1;"><img src="assets/visual_quality.png" alt="Movie Aesthetic Evaluation" style="width: 100%;" /></div>
145
+ </div>
 
 
 
 
 
 
 
 
assets/logo.png CHANGED

Git LFS Details

  • SHA256: 96cddc0f667293436d0b9f92a299b6346b65b231d38ee49719a33d46c91fe1e3
  • Pointer size: 130 Bytes
  • Size of remote file: 56.3 kB