LIA-X-fast / CLAUDE.md
jbilcke-hf's picture
jbilcke-hf HF Staff
up
1595c43

A newer version of the Gradio SDK is available: 5.44.1

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

LIA-X is a Portrait Animator application built with Gradio that enables image animation, image editing, and video editing using deep learning models. It's deployed as a Hugging Face Space with GPU acceleration.

Architecture

Core Components

  1. Main Application (app.py): Gradio web interface that loads the model and serves three main tabs
  2. Generator Network (networks/generator.py): Core neural network model that handles animation and editing
    • Uses encoder-decoder architecture
    • Implements motion encoding and style transfer
    • Pre-allocates tensors for performance optimization
  3. Gradio Tabs (gradio_tabs/): UI modules for different functionalities
    • animation.py: Handles image-to-video animation
    • img_edit.py: Image editing interface
    • vid_edit.py: Video editing interface

Model Architecture

  • Encoder (networks/encoder.py): Encodes source images and motion
  • Decoder (networks/decoder.py): Reconstructs edited/animated outputs
  • Custom Ops (networks/op/): CUDA kernels for optimized operations (fused_act, upfirdn2d)

Development Commands

Running the Application

python app.py

The app launches a Gradio interface on local server. Note: Requires CUDA-capable GPU.

Installing Dependencies

pip install -r requirements.txt

Key dependencies: PyTorch 2.5.1, torchvision, Gradio 5.42.0, einops, imageio, av

Model Loading

The model checkpoint is automatically downloaded from Hugging Face Hub:

  • Repository: YaohuiW/LIA-X
  • File: lia-x.pt

Important Notes

  • This is a GPU-only application (uses torch.device("cuda"))
  • Uses @spaces decorator for Hugging Face Spaces GPU allocation
  • Model operates at 512x512 resolution with motion_dim=40
  • Chunk size of 16 frames for video processing
  • Custom CUDA kernels in networks/op/ require compilation with ninja
  • Git LFS is configured for large files (models, videos, images)

File Processing

  • Images: Loaded as RGB, resized to 512x512, normalized to [-1, 1]
  • Videos: Processed with torchvision, maintains original FPS
  • Supports cropping tools for better results (referenced in instruction.md)

Testing

No explicit test suite found. Manual testing through Gradio interface.

Data Structure

  • data/source/: Source images for examples
  • data/driving/: Driving videos for animation examples
  • assets/: Documentation and UI text (instruction.md, title.md)