CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is the AI Toolkit by Ostris, packaged as a Hugging Face Space for Docker deployment. It's a comprehensive training suite for diffusion models supporting the latest models on consumer-grade hardware. The toolkit includes both CLI and web UI interfaces for training LoRA models, particularly focused on FLUX.1 models.

Architecture

Core Structure

Main Entry Points:
- run.py - CLI interface for running training jobs with config files
- flux_train_ui.py - Gradio-based simple training interface
- start.sh - Docker entry point that launches the web UI
Web UI (ui/): Next.js application with TypeScript
- Frontend in src/app/ with API routes
- Background worker process for job management
- SQLite database via Prisma for job persistence
Core Toolkit (toolkit/): Python modules for ML operations
- Model implementations in toolkit/models/
- Training processes in jobs/process/
- Configuration management and data loading utilities
Extensions (extensions_built_in/): Modular training components
- Support for various model types (FLUX, SDXL, SD 1.5, etc.)
- Different training strategies (LoRA, fine-tuning, etc.)

Key Configuration

Training configs in config/examples/ with YAML format
Docker setup supports GPU passthrough with nvidia runtime
Environment variables for HuggingFace tokens and authentication

Common Development Commands

Setup and Installation

# Python environment setup
python3 -m venv venv
source venv/bin/activate  # or .\venv\Scripts\activate on Windows
pip3 install --no-cache-dir torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126
pip3 install -r requirements.txt

Running Training Jobs

# CLI training with config file
python run.py config/your_config.yml

# Simple Gradio UI for FLUX training
python flux_train_ui.py

Web UI Development

# Development mode (from ui/ directory)
cd ui
npm install
npm run dev

# Production build and start
npm run build_and_start

# Database updates
npm run update_db

Docker Operations

# Run with docker-compose
docker-compose up

# Build custom image
docker build -f docker/Dockerfile -t ai-toolkit .

Authentication Requirements

HuggingFace Access

FLUX.1-dev requires accepting license at https://huggingface.co/black-forest-labs/FLUX.1-dev
Set HF_TOKEN environment variable with READ access token
Create .env file in root: HF_TOKEN=your_key_here

UI Security

Set AI_TOOLKIT_AUTH environment variable for UI authentication
Default password is "password" if not set

Training Configuration

Model Support

FLUX.1-dev: Requires HF token, non-commercial license
FLUX.1-schnell: Apache 2.0, needs training adapter
SDXL, SD 1.5: Standard Stable Diffusion models
Video models: Various I2V and text-to-video architectures

Memory Requirements

FLUX.1 training requires minimum 24GB VRAM
Use low_vram: true in config if running with displays attached
Supports various quantization options to reduce memory usage

Dataset Format

Images: JPG, JPEG, PNG (no WebP)
Captions: .txt files with same name as images
Use [trigger] placeholder in captions, replaced by trigger_word config
Images auto-resized and bucketed, no manual preprocessing needed

Key Files to Understand

run.py:46-85 - Main training job runner and argument parsing
toolkit/job.py - Job management and configuration loading
ui/src/app/api/jobs/route.ts - API endpoints for job management
config/examples/train_lora_flux_24gb.yaml - Standard FLUX training template
extensions_built_in/sd_trainer/SDTrainer.py - Core training logic

Development Notes

Jobs run independently of UI - UI is only for management
Training can be stopped/resumed via checkpoints
Output stored in output/ directory with samples and models
Extensions system allows custom training implementations
Multi-GPU support via accelerate library