CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

LIA-X is a Portrait Animator application built with Gradio that enables image animation, image editing, and video editing using deep learning models. It's deployed as a Hugging Face Space with GPU acceleration.

Architecture

Core Components

Main Application (app.py): Gradio web interface that loads the model and serves three main tabs
Generator Network (networks/generator.py): Core neural network model that handles animation and editing
- Uses encoder-decoder architecture
- Implements motion encoding and style transfer
- Pre-allocates tensors for performance optimization
Gradio Tabs (gradio_tabs/): UI modules for different functionalities
- animation.py: Handles image-to-video animation
- img_edit.py: Image editing interface
- vid_edit.py: Video editing interface

Model Architecture

Encoder (networks/encoder.py): Encodes source images and motion
Decoder (networks/decoder.py): Reconstructs edited/animated outputs
Custom Ops (networks/op/): CUDA kernels for optimized operations (fused_act, upfirdn2d)

Development Commands

Running the Application

python app.py

The app launches a Gradio interface on local server. Note: Requires CUDA-capable GPU.

Installing Dependencies

pip install -r requirements.txt

Key dependencies: PyTorch 2.5.1, torchvision, Gradio 5.42.0, einops, imageio, av

Model Loading

The model checkpoint is automatically downloaded from Hugging Face Hub:

Repository: YaohuiW/LIA-X
File: lia-x.pt

Important Notes

This is a GPU-only application (uses torch.device("cuda"))
Uses @spaces decorator for Hugging Face Spaces GPU allocation
Model operates at 512x512 resolution with motion_dim=40
Chunk size of 16 frames for video processing
Custom CUDA kernels in networks/op/ require compilation with ninja
Git LFS is configured for large files (models, videos, images)

File Processing

Images: Loaded as RGB, resized to 512x512, normalized to [-1, 1]
Videos: Processed with torchvision, maintains original FPS
Supports cropping tools for better results (referenced in instruction.md)

Testing

No explicit test suite found. Manual testing through Gradio interface.

Data Structure

data/source/: Source images for examples
data/driving/: Driving videos for animation examples
assets/: Documentation and UI text (instruction.md, title.md)