ai-toolkit / CLAUDE.md
jbilcke-hf's picture
jbilcke-hf HF Staff
Convert AI-Toolkit to a HF Space
8822914

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is the AI Toolkit by Ostris, packaged as a Hugging Face Space for Docker deployment. It's a comprehensive training suite for diffusion models supporting the latest models on consumer-grade hardware. The toolkit includes both CLI and web UI interfaces for training LoRA models, particularly focused on FLUX.1 models.

Architecture

Core Structure

  • Main Entry Points:

    • run.py - CLI interface for running training jobs with config files
    • flux_train_ui.py - Gradio-based simple training interface
    • start.sh - Docker entry point that launches the web UI
  • Web UI (ui/): Next.js application with TypeScript

    • Frontend in src/app/ with API routes
    • Background worker process for job management
    • SQLite database via Prisma for job persistence
  • Core Toolkit (toolkit/): Python modules for ML operations

    • Model implementations in toolkit/models/
    • Training processes in jobs/process/
    • Configuration management and data loading utilities
  • Extensions (extensions_built_in/): Modular training components

    • Support for various model types (FLUX, SDXL, SD 1.5, etc.)
    • Different training strategies (LoRA, fine-tuning, etc.)

Key Configuration

  • Training configs in config/examples/ with YAML format
  • Docker setup supports GPU passthrough with nvidia runtime
  • Environment variables for HuggingFace tokens and authentication

Common Development Commands

Setup and Installation

# Python environment setup
python3 -m venv venv
source venv/bin/activate  # or .\venv\Scripts\activate on Windows
pip3 install --no-cache-dir torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126
pip3 install -r requirements.txt

Running Training Jobs

# CLI training with config file
python run.py config/your_config.yml

# Simple Gradio UI for FLUX training
python flux_train_ui.py

Web UI Development

# Development mode (from ui/ directory)
cd ui
npm install
npm run dev

# Production build and start
npm run build_and_start

# Database updates
npm run update_db

Docker Operations

# Run with docker-compose
docker-compose up

# Build custom image
docker build -f docker/Dockerfile -t ai-toolkit .

Authentication Requirements

HuggingFace Access

UI Security

  • Set AI_TOOLKIT_AUTH environment variable for UI authentication
  • Default password is "password" if not set

Training Configuration

Model Support

  • FLUX.1-dev: Requires HF token, non-commercial license
  • FLUX.1-schnell: Apache 2.0, needs training adapter
  • SDXL, SD 1.5: Standard Stable Diffusion models
  • Video models: Various I2V and text-to-video architectures

Memory Requirements

  • FLUX.1 training requires minimum 24GB VRAM
  • Use low_vram: true in config if running with displays attached
  • Supports various quantization options to reduce memory usage

Dataset Format

  • Images: JPG, JPEG, PNG (no WebP)
  • Captions: .txt files with same name as images
  • Use [trigger] placeholder in captions, replaced by trigger_word config
  • Images auto-resized and bucketed, no manual preprocessing needed

Key Files to Understand

  • run.py:46-85 - Main training job runner and argument parsing
  • toolkit/job.py - Job management and configuration loading
  • ui/src/app/api/jobs/route.ts - API endpoints for job management
  • config/examples/train_lora_flux_24gb.yaml - Standard FLUX training template
  • extensions_built_in/sd_trainer/SDTrainer.py - Core training logic

Development Notes

  • Jobs run independently of UI - UI is only for management
  • Training can be stopped/resumed via checkpoints
  • Output stored in output/ directory with samples and models
  • Extensions system allows custom training implementations
  • Multi-GPU support via accelerate library