|
# CLAUDE.md |
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
|
|
|
## Project Overview |
|
|
|
Hunyuan-GameCraft is a high-dynamic interactive game video generation system that creates gameplay videos with controllable camera movements and actions. The system uses diffusion models and action-controlled generation to synthesize realistic game footage from reference images and keyboard/mouse input controls. |
|
|
|
## Key Commands |
|
|
|
### Installation |
|
```bash |
|
# Create and activate conda environment |
|
conda create -n HYGameCraft python==3.10 |
|
conda activate HYGameCraft |
|
|
|
# Install PyTorch and dependencies |
|
conda install pytorch==2.5.1 torchvision==0.20.0 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia |
|
|
|
# Install requirements |
|
python -m pip install -r requirements.txt |
|
|
|
# Install flash attention (optional, for acceleration) |
|
python -m pip install ninja |
|
python -m pip install git+https://github.com/Dao-AILab/[email protected] |
|
``` |
|
|
|
### Download Models |
|
```bash |
|
cd weights |
|
huggingface-cli download tencent/Hunyuan-GameCraft-1.0 --local-dir ./ |
|
``` |
|
|
|
### Run Inference |
|
|
|
**Multi-GPU (8 GPUs) - Standard Model:** |
|
```bash |
|
torchrun --nnodes=1 --nproc_per_node=8 --master_port 29605 hymm_sp/sample_batch.py \ |
|
--image-path "asset/village.png" \ |
|
--prompt "YOUR_PROMPT" \ |
|
--ckpt weights/gamecraft_models/mp_rank_00_model_states.pt \ |
|
--video-size 704 1216 \ |
|
--cfg-scale 2.0 \ |
|
--image-start \ |
|
--action-list w s d a \ |
|
--action-speed-list 0.2 0.2 0.2 0.2 \ |
|
--seed 250160 \ |
|
--infer-steps 50 \ |
|
--save-path './results/' |
|
``` |
|
|
|
**Single GPU with Low VRAM (24GB minimum):** |
|
```bash |
|
export DISABLE_SP=1 |
|
export CPU_OFFLOAD=1 |
|
torchrun --nnodes=1 --nproc_per_node=1 --master_port 29605 hymm_sp/sample_batch.py \ |
|
--ckpt weights/gamecraft_models/mp_rank_00_model_states.pt \ |
|
--cpu-offload \ |
|
--use-fp8 \ |
|
[other parameters...] |
|
``` |
|
|
|
**Distilled Model (faster, 8 inference steps):** |
|
```bash |
|
torchrun --nnodes=1 --nproc_per_node=8 --master_port 29605 hymm_sp/sample_batch.py \ |
|
--ckpt weights/gamecraft_models/mp_rank_00_model_states_distill.pt \ |
|
--cfg-scale 1.0 \ |
|
--infer-steps 8 \ |
|
--use-fp8 \ |
|
[other parameters...] |
|
``` |
|
|
|
## Architecture Overview |
|
|
|
### Core Components |
|
|
|
1. **Main Entry Points** |
|
- `hymm_sp/sample_batch.py`: Main script for batch video generation with distributed processing |
|
- `hymm_sp/sample_inference.py`: Core inference logic and model sampling |
|
- `hymm_sp/config.py`: Configuration parsing and argument handling |
|
|
|
2. **Model Architecture (`hymm_sp/modules/`)** |
|
- `models.py`: Core diffusion model implementation |
|
- `cameranet.py`: Camera control and action encoding for game interactions |
|
- `token_refiner.py`: Text token refinement for prompt conditioning |
|
- `parallel_states.py`: Distributed training/inference state management |
|
- `fp8_optimization.py`: FP8 quantization for memory/speed optimization |
|
|
|
3. **VAE Module (`hymm_sp/vae/`)** |
|
- `autoencoder_kl_causal_3d.py`: 3D causal VAE for video encoding/decoding |
|
- Handles latent space conversion for video frames |
|
|
|
4. **Diffusion Pipeline (`hymm_sp/diffusion/`)** |
|
- `pipeline_hunyuan_video_game.py`: Custom pipeline for game video generation |
|
- `scheduling_flow_match_discrete.py`: Flow matching scheduler for denoising |
|
|
|
5. **Data Processing (`hymm_sp/data_kits/`)** |
|
- `video_dataset.py`: Dataset handling for video inputs |
|
- `data_tools.py`: Video saving and processing utilities |
|
|
|
### Key Features |
|
|
|
- **Action Control**: Maps keyboard inputs (w/a/s/d) to continuous camera space for smooth transitions |
|
- **Hybrid History Conditioning**: Extends video sequences autoregressively while preserving scene context |
|
- **Model Distillation**: Accelerated inference model (8 steps vs 50 steps) |
|
- **Memory Optimization**: FP8 quantization, CPU offloading, and SageAttention support |
|
- **Distributed Processing**: Multi-GPU support with sequence parallelism |
|
|
|
### Important Parameters |
|
|
|
- `--action-list`: Sequence of keyboard actions (w/a/s/d) |
|
- `--action-speed-list`: Movement speed for each action (0.0-3.0) |
|
- `--video-size`: Output resolution (height width) |
|
- `--cfg-scale`: Classifier-free guidance scale (1.0 for distilled, 2.0 for standard) |
|
- `--infer-steps`: Denoising steps (8 for distilled, 50 for standard) |
|
- `--use-fp8`: Enable FP8 optimization for memory reduction |
|
- `--cpu-offload`: Offload model to CPU for low VRAM scenarios |
|
|
|
### Model Weights Structure |
|
``` |
|
weights/ |
|
โโโ gamecraft_models/ |
|
โ โโโ mp_rank_00_model_states.pt # Standard model |
|
โ โโโ mp_rank_00_model_states_distill.pt # Distilled model |
|
โโโ stdmodels/ |
|
โโโ vae_3d/ # 3D VAE model |
|
โโโ llava-llama-3-8b-v1_1-transformers/ # Text encoder |
|
โโโ openai_clip-vit-large-patch14/ # CLIP encoder |
|
``` |
|
|
|
## Development Notes |
|
|
|
- Environment variable `MODEL_BASE` should point to `weights/stdmodels` |
|
- Use `export DISABLE_SP=1` and `export CPU_OFFLOAD=1` for single GPU inference |
|
- Minimum GPU memory: 24GB (very slow), Recommended: 80GB per GPU |
|
- Action length determines video duration (1 action = 33 frames at 25 FPS) |
|
- SageAttention can be installed for additional acceleration |