jbilcke-hf HF Staff commited on
Commit
7b99bfa
·
1 Parent(s): 718c1e6
Files changed (2) hide show
  1. CLAUDE.md +123 -0
  2. start.sh +1 -1
CLAUDE.md ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ This is the AI Toolkit by Ostris, packaged as a Hugging Face Space for Docker deployment. It's a comprehensive training suite for diffusion models supporting the latest models on consumer-grade hardware. The toolkit includes both CLI and web UI interfaces for training LoRA models, particularly focused on FLUX.1 models.
8
+
9
+ ## Architecture
10
+
11
+ ### Core Structure
12
+ - **Main Entry Points**:
13
+ - `run.py` - CLI interface for running training jobs with config files
14
+ - `flux_train_ui.py` - Gradio-based simple training interface
15
+ - `start.sh` - Docker entry point that launches the web UI
16
+
17
+ - **Web UI** (`ui/`): Next.js application with TypeScript
18
+ - Frontend in `src/app/` with API routes
19
+ - Background worker process for job management
20
+ - SQLite database via Prisma for job persistence
21
+
22
+ - **Core Toolkit** (`toolkit/`): Python modules for ML operations
23
+ - Model implementations in `toolkit/models/`
24
+ - Training processes in `jobs/process/`
25
+ - Configuration management and data loading utilities
26
+
27
+ - **Extensions** (`extensions_built_in/`): Modular training components
28
+ - Support for various model types (FLUX, SDXL, SD 1.5, etc.)
29
+ - Different training strategies (LoRA, fine-tuning, etc.)
30
+
31
+ ### Key Configuration
32
+ - Training configs in `config/examples/` with YAML format
33
+ - Docker setup supports GPU passthrough with nvidia runtime
34
+ - Environment variables for HuggingFace tokens and authentication
35
+
36
+ ## Common Development Commands
37
+
38
+ ### Setup and Installation
39
+ ```bash
40
+ # Python environment setup
41
+ python3 -m venv venv
42
+ source venv/bin/activate # or .\venv\Scripts\activate on Windows
43
+ pip3 install --no-cache-dir torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126
44
+ pip3 install -r requirements.txt
45
+ ```
46
+
47
+ ### Running Training Jobs
48
+ ```bash
49
+ # CLI training with config file
50
+ python run.py config/your_config.yml
51
+
52
+ # Simple Gradio UI for FLUX training
53
+ python flux_train_ui.py
54
+ ```
55
+
56
+ ### Web UI Development
57
+ ```bash
58
+ # Development mode (from ui/ directory)
59
+ cd ui
60
+ npm install
61
+ npm run dev
62
+
63
+ # Production build and start
64
+ npm run build_and_start
65
+
66
+ # Database updates
67
+ npm run update_db
68
+ ```
69
+
70
+ ### Docker Operations
71
+ ```bash
72
+ # Run with docker-compose
73
+ docker-compose up
74
+
75
+ # Build custom image
76
+ docker build -f docker/Dockerfile -t ai-toolkit .
77
+ ```
78
+
79
+ ## Authentication Requirements
80
+
81
+ ### HuggingFace Access
82
+ - FLUX.1-dev requires accepting license at https://huggingface.co/black-forest-labs/FLUX.1-dev
83
+ - Set `HF_TOKEN` environment variable with READ access token
84
+ - Create `.env` file in root: `HF_TOKEN=your_key_here`
85
+
86
+ ### UI Security
87
+ - Set `AI_TOOLKIT_AUTH` environment variable for UI authentication
88
+ - Default password is "password" if not set
89
+
90
+ ## Training Configuration
91
+
92
+ ### Model Support
93
+ - **FLUX.1-dev**: Requires HF token, non-commercial license
94
+ - **FLUX.1-schnell**: Apache 2.0, needs training adapter
95
+ - **SDXL, SD 1.5**: Standard Stable Diffusion models
96
+ - **Video models**: Various I2V and text-to-video architectures
97
+
98
+ ### Memory Requirements
99
+ - FLUX.1 training requires minimum 24GB VRAM
100
+ - Use `low_vram: true` in config if running with displays attached
101
+ - Supports various quantization options to reduce memory usage
102
+
103
+ ### Dataset Format
104
+ - Images: JPG, JPEG, PNG (no WebP)
105
+ - Captions: `.txt` files with same name as images
106
+ - Use `[trigger]` placeholder in captions, replaced by `trigger_word` config
107
+ - Images auto-resized and bucketed, no manual preprocessing needed
108
+
109
+ ## Key Files to Understand
110
+
111
+ - `run.py:46-85` - Main training job runner and argument parsing
112
+ - `toolkit/job.py` - Job management and configuration loading
113
+ - `ui/src/app/api/jobs/route.ts` - API endpoints for job management
114
+ - `config/examples/train_lora_flux_24gb.yaml` - Standard FLUX training template
115
+ - `extensions_built_in/sd_trainer/SDTrainer.py` - Core training logic
116
+
117
+ ## Development Notes
118
+
119
+ - Jobs run independently of UI - UI is only for management
120
+ - Training can be stopped/resumed via checkpoints
121
+ - Output stored in `output/` directory with samples and models
122
+ - Extensions system allows custom training implementations
123
+ - Multi-GPU support via accelerate library
start.sh CHANGED
@@ -2,4 +2,4 @@
2
  set -e # Exit the script if any statement returns a non-true return value
3
 
4
  echo "Starting AI Toolkit UI..."
5
- cd ui && npm run start
 
2
  set -e # Exit the script if any statement returns a non-true return value
3
 
4
  echo "Starting AI Toolkit UI..."
5
+ cd /app/ai-toolkit/ui && npm run start