Spaces:
Paused
Paused
File size: 4,198 Bytes
8822914 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This is the AI Toolkit by Ostris, packaged as a Hugging Face Space for Docker deployment. It's a comprehensive training suite for diffusion models supporting the latest models on consumer-grade hardware. The toolkit includes both CLI and web UI interfaces for training LoRA models, particularly focused on FLUX.1 models.
## Architecture
### Core Structure
- **Main Entry Points**:
- `run.py` - CLI interface for running training jobs with config files
- `flux_train_ui.py` - Gradio-based simple training interface
- `start.sh` - Docker entry point that launches the web UI
- **Web UI** (`ui/`): Next.js application with TypeScript
- Frontend in `src/app/` with API routes
- Background worker process for job management
- SQLite database via Prisma for job persistence
- **Core Toolkit** (`toolkit/`): Python modules for ML operations
- Model implementations in `toolkit/models/`
- Training processes in `jobs/process/`
- Configuration management and data loading utilities
- **Extensions** (`extensions_built_in/`): Modular training components
- Support for various model types (FLUX, SDXL, SD 1.5, etc.)
- Different training strategies (LoRA, fine-tuning, etc.)
### Key Configuration
- Training configs in `config/examples/` with YAML format
- Docker setup supports GPU passthrough with nvidia runtime
- Environment variables for HuggingFace tokens and authentication
## Common Development Commands
### Setup and Installation
```bash
# Python environment setup
python3 -m venv venv
source venv/bin/activate # or .\venv\Scripts\activate on Windows
pip3 install --no-cache-dir torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126
pip3 install -r requirements.txt
```
### Running Training Jobs
```bash
# CLI training with config file
python run.py config/your_config.yml
# Simple Gradio UI for FLUX training
python flux_train_ui.py
```
### Web UI Development
```bash
# Development mode (from ui/ directory)
cd ui
npm install
npm run dev
# Production build and start
npm run build_and_start
# Database updates
npm run update_db
```
### Docker Operations
```bash
# Run with docker-compose
docker-compose up
# Build custom image
docker build -f docker/Dockerfile -t ai-toolkit .
```
## Authentication Requirements
### HuggingFace Access
- FLUX.1-dev requires accepting license at https://huggingface.co/black-forest-labs/FLUX.1-dev
- Set `HF_TOKEN` environment variable with READ access token
- Create `.env` file in root: `HF_TOKEN=your_key_here`
### UI Security
- Set `AI_TOOLKIT_AUTH` environment variable for UI authentication
- Default password is "password" if not set
## Training Configuration
### Model Support
- **FLUX.1-dev**: Requires HF token, non-commercial license
- **FLUX.1-schnell**: Apache 2.0, needs training adapter
- **SDXL, SD 1.5**: Standard Stable Diffusion models
- **Video models**: Various I2V and text-to-video architectures
### Memory Requirements
- FLUX.1 training requires minimum 24GB VRAM
- Use `low_vram: true` in config if running with displays attached
- Supports various quantization options to reduce memory usage
### Dataset Format
- Images: JPG, JPEG, PNG (no WebP)
- Captions: `.txt` files with same name as images
- Use `[trigger]` placeholder in captions, replaced by `trigger_word` config
- Images auto-resized and bucketed, no manual preprocessing needed
## Key Files to Understand
- `run.py:46-85` - Main training job runner and argument parsing
- `toolkit/job.py` - Job management and configuration loading
- `ui/src/app/api/jobs/route.ts` - API endpoints for job management
- `config/examples/train_lora_flux_24gb.yaml` - Standard FLUX training template
- `extensions_built_in/sd_trainer/SDTrainer.py` - Core training logic
## Development Notes
- Jobs run independently of UI - UI is only for management
- Training can be stopped/resumed via checkpoints
- Output stored in `output/` directory with samples and models
- Extensions system allows custom training implementations
- Multi-GPU support via accelerate library |