Spaces:
Runtime error
Runtime error
| # Models Overview | |
| WanGP supports multiple video generation models, each optimized for different use cases and hardware configurations. | |
| ## Wan 2.1 Text2Video Models | |
| Please note that that the term *Text2Video* refers to the underlying Wan architecture but as it has been greatly improved overtime many derived Text2Video models can now generate videos using images. | |
| #### Wan 2.1 Text2Video 1.3B | |
| - **Size**: 1.3 billion parameters | |
| - **VRAM**: 6GB minimum | |
| - **Speed**: Fast generation | |
| - **Quality**: Good quality for the size | |
| - **Best for**: Quick iterations, lower-end hardware | |
| - **Command**: `python wgp.py --t2v-1-3B` | |
| #### Wan 2.1 Text2Video 14B | |
| - **Size**: 14 billion parameters | |
| - **VRAM**: 12GB+ recommended | |
| - **Speed**: Slower but higher quality | |
| - **Quality**: Excellent detail and coherence | |
| - **Best for**: Final production videos | |
| - **Command**: `python wgp.py --t2v-14B` | |
| #### Wan Vace 1.3B | |
| - **Type**: ControlNet for advanced video control | |
| - **VRAM**: 6GB minimum | |
| - **Features**: Motion transfer, object injection, inpainting | |
| - **Best for**: Advanced video manipulation | |
| - **Command**: `python wgp.py --vace-1.3B` | |
| #### Wan Vace 14B | |
| - **Type**: Large ControlNet model | |
| - **VRAM**: 12GB+ recommended | |
| - **Features**: All Vace features with higher quality | |
| - **Best for**: Professional video editing workflows | |
| #### MoviiGen (Experimental) | |
| - **Resolution**: Claims 1080p capability | |
| - **VRAM**: 20GB+ required | |
| - **Speed**: Very slow generation | |
| - **Features**: Should generate cinema like video, specialized for 2.1 / 1 ratios | |
| - **Status**: Experimental, feedback welcome | |
| <BR> | |
| ## Wan 2.1 Image-to-Video Models | |
| #### Wan 2.1 Image2Video 14B | |
| - **Size**: 14 billion parameters | |
| - **VRAM**: 12GB+ recommended | |
| - **Speed**: Slower but higher quality | |
| - **Quality**: Excellent detail and coherence | |
| - **Best for**: Most Loras available work with this model | |
| - **Command**: `python wgp.py --i2v-14B` | |
| #### FLF2V | |
| - **Type**: Start/end frame specialist | |
| - **Resolution**: Optimized for 720p | |
| - **Official**: Wan team supported | |
| - **Use case**: Image-to-video with specific endpoints | |
| <BR> | |
| ## Wan 2.1 Specialized Models | |
| #### FantasySpeaking | |
| - **Type**: Talking head animation | |
| - **Input**: Voice track + image | |
| - **Works on**: People and objects | |
| - **Use case**: Lip-sync and voice-driven animation | |
| #### Phantom | |
| - **Type**: Person/object transfer | |
| - **Resolution**: Works well at 720p | |
| - **Requirements**: 30+ steps for good results | |
| - **Best for**: Transferring subjects between videos | |
| #### Recam Master | |
| - **Type**: Viewpoint change | |
| - **Requirements**: 81+ frame input videos, 15+ denoising steps | |
| - **Use case**: View same scene from different angles | |
| #### Sky Reels v2 | |
| - **Type**: Diffusion Forcing model | |
| - **Specialty**: "Infinite length" videos | |
| - **Features**: High quality continuous generation | |
| <BR> | |
| ## Wan Fun InP Models | |
| #### Wan Fun InP 1.3B | |
| - **Size**: 1.3 billion parameters | |
| - **VRAM**: 6GB minimum | |
| - **Quality**: Good for the size, accessible to lower hardware | |
| - **Best for**: Entry-level image animation | |
| - **Command**: `python wgp.py --i2v-1-3B` | |
| #### Wan Fun InP 14B | |
| - **Size**: 14 billion parameters | |
| - **VRAM**: 12GB+ recommended | |
| - **Quality**: Better end image support | |
| - **Limitation**: Existing loras don't work as well | |
| <BR> | |
| ## Wan Special Loras | |
| ### Safe-Forcing lightx2v Lora | |
| - **Type**: Distilled model (Lora implementation) | |
| - **Speed**: 4-8 steps generation, 2x faster (no classifier free guidance) | |
| - **Compatible**: Works with t2v and i2v Wan 14B models | |
| - **Setup**: Requires Safe-Forcing lightx2v Lora (see [LORAS.md](LORAS.md)) | |
| ### Causvid Lora | |
| - **Type**: Distilled model (Lora implementation) | |
| - **Speed**: 4-12 steps generation, 2x faster (no classifier free guidance) | |
| - **Compatible**: Works with Wan 14B models | |
| - **Setup**: Requires CausVid Lora (see [LORAS.md](LORAS.md)) | |
| <BR> | |
| ## Hunyuan Video Models | |
| #### Hunyuan Video Text2Video | |
| - **Quality**: Among the best open source t2v models | |
| - **VRAM**: 12GB+ recommended | |
| - **Speed**: Slower generation but excellent results | |
| - **Features**: Superior text adherence and video quality, up to 10s of video | |
| - **Best for**: High-quality text-to-video generation | |
| #### Hunyuan Video Custom | |
| - **Specialty**: Identity preservation | |
| - **Use case**: Injecting specific people into videos | |
| - **Quality**: Excellent for character consistency | |
| - **Best for**: Character-focused video generation | |
| #### Hunyuan Video Avater | |
| - **Specialty**: Generate up to 15s of high quality speech / song driven Video . | |
| - **Use case**: Injecting specific people into videos | |
| - **Quality**: Excellent for character consistency | |
| - **Best for**: Character-focused video generation, Video synchronized with voice | |
| <BR> | |
| ## LTX Video Models | |
| #### LTX Video 13B | |
| - **Specialty**: Long video generation | |
| - **Resolution**: Fast 720p generation | |
| - **VRAM**: Optimized by WanGP (4x reduction in requirements) | |
| - **Best for**: Longer duration videos | |
| #### LTX Video 13B Distilled | |
| - **Speed**: Generate in less than one minute | |
| - **Quality**: Very high quality despite speed | |
| - **Best for**: Rapid prototyping and quick results | |
| <BR> | |
| ## Model Selection Guide | |
| ### By Hardware (VRAM) | |
| #### 6-8GB VRAM | |
| - Wan 2.1 T2V 1.3B | |
| - Wan Fun InP 1.3B | |
| - Wan Vace 1.3B | |
| #### 10-12GB VRAM | |
| - Wan 2.1 T2V 14B | |
| - Wan Fun InP 14B | |
| - Hunyuan Video (with optimizations) | |
| - LTX Video 13B | |
| #### 16GB+ VRAM | |
| - All models supported | |
| - Longer videos possible | |
| - Higher resolutions | |
| - Multiple simultaneous Loras | |
| #### 20GB+ VRAM | |
| - MoviiGen (experimental 1080p) | |
| - Very long videos | |
| - Maximum quality settings | |
| ### By Use Case | |
| #### Quick Prototyping | |
| 1. **LTX Video 13B Distilled** - Fastest, high quality | |
| 2. **Wan 2.1 T2V 1.3B** - Fast, good quality | |
| 3. **CausVid Lora** - 4-12 steps, very fast | |
| #### Best Quality | |
| 1. **Hunyuan Video** - Overall best t2v quality | |
| 2. **Wan 2.1 T2V 14B** - Excellent Wan quality | |
| 3. **Wan Vace 14B** - Best for controlled generation | |
| #### Advanced Control | |
| 1. **Wan Vace 14B/1.3B** - Motion transfer, object injection | |
| 2. **Phantom** - Person/object transfer | |
| 3. **FantasySpeaking** - Voice-driven animation | |
| #### Long Videos | |
| 1. **LTX Video 13B** - Specialized for length | |
| 2. **Sky Reels v2** - Infinite length videos | |
| 3. **Wan Vace + Sliding Windows** - Up to 1 minute | |
| #### Lower Hardware | |
| 1. **Wan Fun InP 1.3B** - Image-to-video | |
| 2. **Wan 2.1 T2V 1.3B** - Text-to-video | |
| 3. **Wan Vace 1.3B** - Advanced control | |
| <BR> | |
| ## Performance Comparison | |
| ### Speed (Relative) | |
| 1. **CausVid Lora** (4-12 steps) - Fastest | |
| 2. **LTX Video Distilled** - Very fast | |
| 3. **Wan 1.3B models** - Fast | |
| 4. **Wan 14B models** - Medium | |
| 5. **Hunyuan Video** - Slower | |
| 6. **MoviiGen** - Slowest | |
| ### Quality (Subjective) | |
| 1. **Hunyuan Video** - Highest overall | |
| 2. **Wan 14B models** - Excellent | |
| 3. **LTX Video models** - Very good | |
| 4. **Wan 1.3B models** - Good | |
| 5. **CausVid** - Good (varies with steps) | |
| ### VRAM Efficiency | |
| 1. **Wan 1.3B models** - Most efficient | |
| 2. **LTX Video** (with WanGP optimizations) | |
| 3. **Wan 14B models** | |
| 4. **Hunyuan Video** | |
| 5. **MoviiGen** - Least efficient | |
| <BR> | |
| ## Model Switching | |
| WanGP allows switching between models without restarting: | |
| 1. Use the dropdown menu in the web interface | |
| 2. Models are loaded on-demand | |
| 3. Previous model is unloaded to save VRAM | |
| 4. Settings are preserved when possible | |
| <BR> | |
| ## Tips for Model Selection | |
| ### First Time Users | |
| Start with **Wan 2.1 T2V 1.3B** to learn the interface and test your hardware. | |
| ### Production Work | |
| Use **Hunyuan Video** or **Wan 14B** models for final output quality. | |
| ### Experimentation | |
| **CausVid Lora** or **LTX Distilled** for rapid iteration and testing. | |
| ### Specialized Tasks | |
| - **VACE** for advanced control | |
| - **FantasySpeaking** for talking heads | |
| - **LTX Video** for long sequences | |
| ### Hardware Optimization | |
| Always start with the largest model your VRAM can handle, then optimize settings for speed vs quality based on your needs. |