Spaces:
Running
on
L40S
Running
on
L40S
A newer version of the Gradio SDK is available:
5.44.1
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
LIA-X is a Portrait Animator application built with Gradio that enables image animation, image editing, and video editing using deep learning models. It's deployed as a Hugging Face Space with GPU acceleration.
Architecture
Core Components
- Main Application (
app.py
): Gradio web interface that loads the model and serves three main tabs - Generator Network (
networks/generator.py
): Core neural network model that handles animation and editing- Uses encoder-decoder architecture
- Implements motion encoding and style transfer
- Pre-allocates tensors for performance optimization
- Gradio Tabs (
gradio_tabs/
): UI modules for different functionalitiesanimation.py
: Handles image-to-video animationimg_edit.py
: Image editing interfacevid_edit.py
: Video editing interface
Model Architecture
- Encoder (
networks/encoder.py
): Encodes source images and motion - Decoder (
networks/decoder.py
): Reconstructs edited/animated outputs - Custom Ops (
networks/op/
): CUDA kernels for optimized operations (fused_act, upfirdn2d)
Development Commands
Running the Application
python app.py
The app launches a Gradio interface on local server. Note: Requires CUDA-capable GPU.
Installing Dependencies
pip install -r requirements.txt
Key dependencies: PyTorch 2.5.1, torchvision, Gradio 5.42.0, einops, imageio, av
Model Loading
The model checkpoint is automatically downloaded from Hugging Face Hub:
- Repository:
YaohuiW/LIA-X
- File:
lia-x.pt
Important Notes
- This is a GPU-only application (uses
torch.device("cuda")
) - Uses
@spaces
decorator for Hugging Face Spaces GPU allocation - Model operates at 512x512 resolution with motion_dim=40
- Chunk size of 16 frames for video processing
- Custom CUDA kernels in
networks/op/
require compilation with ninja - Git LFS is configured for large files (models, videos, images)
File Processing
- Images: Loaded as RGB, resized to 512x512, normalized to [-1, 1]
- Videos: Processed with torchvision, maintains original FPS
- Supports cropping tools for better results (referenced in instruction.md)
Testing
No explicit test suite found. Manual testing through Gradio interface.
Data Structure
data/source/
: Source images for examplesdata/driving/
: Driving videos for animation examplesassets/
: Documentation and UI text (instruction.md, title.md)