Spaces:
Running
on
L40S
Running
on
L40S
Commit
·
1595c43
1
Parent(s):
d72fa8b
up
Browse files
CLAUDE.md
ADDED
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# CLAUDE.md
|
2 |
+
|
3 |
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
4 |
+
|
5 |
+
## Project Overview
|
6 |
+
|
7 |
+
LIA-X is a Portrait Animator application built with Gradio that enables image animation, image editing, and video editing using deep learning models. It's deployed as a Hugging Face Space with GPU acceleration.
|
8 |
+
|
9 |
+
## Architecture
|
10 |
+
|
11 |
+
### Core Components
|
12 |
+
|
13 |
+
1. **Main Application** (`app.py`): Gradio web interface that loads the model and serves three main tabs
|
14 |
+
2. **Generator Network** (`networks/generator.py`): Core neural network model that handles animation and editing
|
15 |
+
- Uses encoder-decoder architecture
|
16 |
+
- Implements motion encoding and style transfer
|
17 |
+
- Pre-allocates tensors for performance optimization
|
18 |
+
3. **Gradio Tabs** (`gradio_tabs/`): UI modules for different functionalities
|
19 |
+
- `animation.py`: Handles image-to-video animation
|
20 |
+
- `img_edit.py`: Image editing interface
|
21 |
+
- `vid_edit.py`: Video editing interface
|
22 |
+
|
23 |
+
### Model Architecture
|
24 |
+
|
25 |
+
- **Encoder** (`networks/encoder.py`): Encodes source images and motion
|
26 |
+
- **Decoder** (`networks/decoder.py`): Reconstructs edited/animated outputs
|
27 |
+
- **Custom Ops** (`networks/op/`): CUDA kernels for optimized operations (fused_act, upfirdn2d)
|
28 |
+
|
29 |
+
## Development Commands
|
30 |
+
|
31 |
+
### Running the Application
|
32 |
+
|
33 |
+
```bash
|
34 |
+
python app.py
|
35 |
+
```
|
36 |
+
|
37 |
+
The app launches a Gradio interface on local server. Note: Requires CUDA-capable GPU.
|
38 |
+
|
39 |
+
### Installing Dependencies
|
40 |
+
|
41 |
+
```bash
|
42 |
+
pip install -r requirements.txt
|
43 |
+
```
|
44 |
+
|
45 |
+
Key dependencies: PyTorch 2.5.1, torchvision, Gradio 5.42.0, einops, imageio, av
|
46 |
+
|
47 |
+
### Model Loading
|
48 |
+
|
49 |
+
The model checkpoint is automatically downloaded from Hugging Face Hub:
|
50 |
+
- Repository: `YaohuiW/LIA-X`
|
51 |
+
- File: `lia-x.pt`
|
52 |
+
|
53 |
+
## Important Notes
|
54 |
+
|
55 |
+
- This is a GPU-only application (uses `torch.device("cuda")`)
|
56 |
+
- Uses `@spaces` decorator for Hugging Face Spaces GPU allocation
|
57 |
+
- Model operates at 512x512 resolution with motion_dim=40
|
58 |
+
- Chunk size of 16 frames for video processing
|
59 |
+
- Custom CUDA kernels in `networks/op/` require compilation with ninja
|
60 |
+
- Git LFS is configured for large files (models, videos, images)
|
61 |
+
|
62 |
+
## File Processing
|
63 |
+
|
64 |
+
- Images: Loaded as RGB, resized to 512x512, normalized to [-1, 1]
|
65 |
+
- Videos: Processed with torchvision, maintains original FPS
|
66 |
+
- Supports cropping tools for better results (referenced in instruction.md)
|
67 |
+
|
68 |
+
## Testing
|
69 |
+
|
70 |
+
No explicit test suite found. Manual testing through Gradio interface.
|
71 |
+
|
72 |
+
## Data Structure
|
73 |
+
|
74 |
+
- `data/source/`: Source images for examples
|
75 |
+
- `data/driving/`: Driving videos for animation examples
|
76 |
+
- `assets/`: Documentation and UI text (instruction.md, title.md)
|