Hunyuan-GameCraft / CLAUDE.md
jbilcke-hf's picture
jbilcke-hf HF Staff
Initial commit with LFS-tracked binary files
01c0e76

A newer version of the Gradio SDK is available: 5.44.1

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Hunyuan-GameCraft is a high-dynamic interactive game video generation system that creates gameplay videos with controllable camera movements and actions. The system uses diffusion models and action-controlled generation to synthesize realistic game footage from reference images and keyboard/mouse input controls.

Key Commands

Installation

# Create and activate conda environment
conda create -n HYGameCraft python==3.10
conda activate HYGameCraft

# Install PyTorch and dependencies
conda install pytorch==2.5.1 torchvision==0.20.0 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia

# Install requirements
python -m pip install -r requirements.txt

# Install flash attention (optional, for acceleration)
python -m pip install ninja
python -m pip install git+https://github.com/Dao-AILab/[email protected]

Download Models

cd weights
huggingface-cli download tencent/Hunyuan-GameCraft-1.0 --local-dir ./

Run Inference

Multi-GPU (8 GPUs) - Standard Model:

torchrun --nnodes=1 --nproc_per_node=8 --master_port 29605 hymm_sp/sample_batch.py \
    --image-path "asset/village.png" \
    --prompt "YOUR_PROMPT" \
    --ckpt weights/gamecraft_models/mp_rank_00_model_states.pt \
    --video-size 704 1216 \
    --cfg-scale 2.0 \
    --image-start \
    --action-list w s d a \
    --action-speed-list 0.2 0.2 0.2 0.2 \
    --seed 250160 \
    --infer-steps 50 \
    --save-path './results/'

Single GPU with Low VRAM (24GB minimum):

export DISABLE_SP=1
export CPU_OFFLOAD=1
torchrun --nnodes=1 --nproc_per_node=1 --master_port 29605 hymm_sp/sample_batch.py \
    --ckpt weights/gamecraft_models/mp_rank_00_model_states.pt \
    --cpu-offload \
    --use-fp8 \
    [other parameters...]

Distilled Model (faster, 8 inference steps):

torchrun --nnodes=1 --nproc_per_node=8 --master_port 29605 hymm_sp/sample_batch.py \
    --ckpt weights/gamecraft_models/mp_rank_00_model_states_distill.pt \
    --cfg-scale 1.0 \
    --infer-steps 8 \
    --use-fp8 \
    [other parameters...]

Architecture Overview

Core Components

  1. Main Entry Points

    • hymm_sp/sample_batch.py: Main script for batch video generation with distributed processing
    • hymm_sp/sample_inference.py: Core inference logic and model sampling
    • hymm_sp/config.py: Configuration parsing and argument handling
  2. Model Architecture (hymm_sp/modules/)

    • models.py: Core diffusion model implementation
    • cameranet.py: Camera control and action encoding for game interactions
    • token_refiner.py: Text token refinement for prompt conditioning
    • parallel_states.py: Distributed training/inference state management
    • fp8_optimization.py: FP8 quantization for memory/speed optimization
  3. VAE Module (hymm_sp/vae/)

    • autoencoder_kl_causal_3d.py: 3D causal VAE for video encoding/decoding
    • Handles latent space conversion for video frames
  4. Diffusion Pipeline (hymm_sp/diffusion/)

    • pipeline_hunyuan_video_game.py: Custom pipeline for game video generation
    • scheduling_flow_match_discrete.py: Flow matching scheduler for denoising
  5. Data Processing (hymm_sp/data_kits/)

    • video_dataset.py: Dataset handling for video inputs
    • data_tools.py: Video saving and processing utilities

Key Features

  • Action Control: Maps keyboard inputs (w/a/s/d) to continuous camera space for smooth transitions
  • Hybrid History Conditioning: Extends video sequences autoregressively while preserving scene context
  • Model Distillation: Accelerated inference model (8 steps vs 50 steps)
  • Memory Optimization: FP8 quantization, CPU offloading, and SageAttention support
  • Distributed Processing: Multi-GPU support with sequence parallelism

Important Parameters

  • --action-list: Sequence of keyboard actions (w/a/s/d)
  • --action-speed-list: Movement speed for each action (0.0-3.0)
  • --video-size: Output resolution (height width)
  • --cfg-scale: Classifier-free guidance scale (1.0 for distilled, 2.0 for standard)
  • --infer-steps: Denoising steps (8 for distilled, 50 for standard)
  • --use-fp8: Enable FP8 optimization for memory reduction
  • --cpu-offload: Offload model to CPU for low VRAM scenarios

Model Weights Structure

weights/
โ”œโ”€โ”€ gamecraft_models/
โ”‚   โ”œโ”€โ”€ mp_rank_00_model_states.pt        # Standard model
โ”‚   โ””โ”€โ”€ mp_rank_00_model_states_distill.pt # Distilled model
โ””โ”€โ”€ stdmodels/
    โ”œโ”€โ”€ vae_3d/                            # 3D VAE model
    โ”œโ”€โ”€ llava-llama-3-8b-v1_1-transformers/ # Text encoder
    โ””โ”€โ”€ openai_clip-vit-large-patch14/     # CLIP encoder

Development Notes

  • Environment variable MODEL_BASE should point to weights/stdmodels
  • Use export DISABLE_SP=1 and export CPU_OFFLOAD=1 for single GPU inference
  • Minimum GPU memory: 24GB (very slow), Recommended: 80GB per GPU
  • Action length determines video duration (1 action = 33 frames at 25 FPS)
  • SageAttention can be installed for additional acceleration