File size: 13,839 Bytes
338d95d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 |
# CompI Phase 1: Text-to-Image Generation Usage Guide
This guide covers the Phase 1 implementation of CompI's text-to-image generation capabilities using Stable Diffusion.
## π Quick Start
### Basic Usage
```bash
# Simple generation with interactive prompt
python run_basic_generation.py
# Generate from command line
python run_basic_generation.py "A magical forest, digital art, highly detailed"
# Or run directly from src/generators/
python src/generators/compi_phase1_text2image.py "A magical forest"
```
### Advanced Usage
```bash
# Advanced script with more options
python run_advanced_generation.py "cyberpunk city at sunset" --negative "blurry, low quality" --steps 50 --batch 3
# Interactive mode for experimentation
python run_advanced_generation.py --interactive
# Or run directly from src/generators/
python src/generators/compi_phase1_advanced.py --interactive
```
## π Available Scripts
### 1. `compi_phase1_text2image.py` - Basic Implementation
**Features:**
- Simple, standalone text-to-image generation
- Automatic GPU/CPU detection
- Command line or interactive prompts
- Automatic output saving with descriptive filenames
- Comprehensive logging
**Usage:**
```bash
python compi_phase1_text2image.py [prompt]
```
### 2. `compi_phase1_advanced.py` - Enhanced Implementation
**Features:**
- Batch generation (multiple images)
- Negative prompts (what to avoid)
- Customizable parameters (steps, guidance, dimensions)
- Interactive mode for experimentation
- Metadata saving (JSON files with generation parameters)
- Multiple model support
**Command Line Options:**
```bash
python compi_phase1_advanced.py [OPTIONS] [PROMPT]
Options:
--negative, -n TEXT Negative prompt (what to avoid)
--steps, -s INTEGER Number of inference steps (default: 30)
--guidance, -g FLOAT Guidance scale (default: 7.5)
--seed INTEGER Random seed for reproducibility
--batch, -b INTEGER Number of images to generate
--width, -w INTEGER Image width (default: 512)
--height INTEGER Image height (default: 512)
--model, -m TEXT Model to use (default: runwayml/stable-diffusion-v1-5)
--output, -o TEXT Output directory (default: outputs)
--interactive, -i Interactive mode
```
## π¨ Example Commands
### Basic Examples
```bash
# Simple landscape
python run_basic_generation.py "serene mountain lake, golden hour, photorealistic"
# Digital art style
python run_basic_generation.py "futuristic robot, neon lights, cyberpunk style, digital art"
```
### Advanced Examples
```bash
# High-quality generation with negative prompts
python run_advanced_generation.py "beautiful portrait of a woman, oil painting style" \
--negative "blurry, distorted, low quality, bad anatomy" \
--steps 50 --guidance 8.0
# Batch generation with fixed seed
python run_advanced_generation.py "abstract geometric patterns, colorful" \
--batch 5 --seed 12345 --steps 40
# Custom dimensions for landscape
python run_advanced_generation.py "panoramic view of alien landscape" \
--width 768 --height 512 --steps 35
# Interactive experimentation
python run_advanced_generation.py --interactive
```
## π Output Structure
Generated images are saved in the `outputs/` directory with descriptive filenames:
```
outputs/
βββ magical_forest_digital_art_20241225_143022_seed42.png
βββ magical_forest_digital_art_20241225_143022_seed42_metadata.json
βββ cyberpunk_city_sunset_20241225_143156_seed1337.png
βββ cyberpunk_city_sunset_20241225_143156_seed1337_metadata.json
```
### Metadata Files
Each generated image (in advanced mode) includes a JSON metadata file with:
- Original prompt and negative prompt
- Generation parameters (steps, guidance, seed)
- Image dimensions and model used
- Timestamp and batch information
## βοΈ Configuration Tips
### For Best Quality
- Use 30-50 inference steps
- Guidance scale 7.5-12.0
- Include style descriptors ("digital art", "oil painting", "photorealistic")
- Use negative prompts to avoid unwanted elements
### For Speed
- Use 20-25 inference steps
- Lower guidance scale (6.0-7.5)
- Stick to 512x512 resolution
### For Experimentation
- Use interactive mode
- Try different seeds with the same prompt
- Experiment with guidance scale values
- Use batch generation to explore variations
## π§ Troubleshooting
### Common Issues
1. **CUDA out of memory**: Reduce batch size or image dimensions
2. **Slow generation**: Ensure CUDA is available and working
3. **Poor quality**: Increase steps, adjust guidance scale, improve prompts
4. **Model download fails**: Check internet connection, try again
### Performance Optimization
- The scripts automatically enable attention slicing for memory efficiency
- GPU detection is automatic
- Models are cached after first download
## π¨ Phase 1.B: Style Conditioning & Prompt Engineering
### 3. `compi_phase1b_styled_generation.py` - Style Conditioning
**Features:**
- Interactive style and mood selection from curated lists
- Intelligent prompt engineering and combination
- Multiple variations with unique seeds
- Comprehensive logging and filename organization
**Usage:**
```bash
python run_styled_generation.py [prompt]
# Or directly: python src/generators/compi_phase1b_styled_generation.py [prompt]
```
### 4. `compi_phase1b_advanced_styling.py` - Advanced Style Control
**Features:**
- 13 predefined art styles with optimized prompts and negative prompts
- 9 mood categories with atmospheric conditioning
- Quality presets (draft/standard/high)
- Command line and interactive modes
- Comprehensive metadata saving
**Command Line Options:**
```bash
python run_advanced_styling.py [OPTIONS] [PROMPT]
# Or directly: python src/generators/compi_phase1b_advanced_styling.py [OPTIONS] [PROMPT]
Options:
--style, -s TEXT Art style (or number from list)
--mood, -m TEXT Mood/atmosphere (or number from list)
--variations, -v INT Number of variations (default: 1)
--quality, -q CHOICE Quality preset [draft/standard/high]
--negative, -n TEXT Negative prompt
--interactive, -i Interactive mode
--list-styles List available styles and exit
--list-moods List available moods and exit
```
### Style Conditioning Examples
**Basic Style Selection:**
```bash
# Interactive mode with guided selection
python run_styled_generation.py
# Command line with style selection
python run_advanced_styling.py "mountain landscape" --style cyberpunk --mood dramatic
```
**Advanced Style Control:**
```bash
# High quality with multiple variations
python run_advanced_styling.py "portrait of a wizard" \
--style "oil painting" --mood "mysterious" \
--quality high --variations 3 \
--negative "blurry, distorted, amateur"
# List available options
python run_advanced_styling.py --list-styles
python run_advanced_styling.py --list-moods
```
**Available Styles:**
- digital art, oil painting, watercolor, cyberpunk
- impressionist, concept art, anime, photorealistic
- minimalist, surrealism, pixel art, steampunk, 3d render
**Available Moods:**
- dreamy, dark, peaceful, vibrant, melancholic
- mysterious, whimsical, dramatic, retro
## π₯οΈ Phase 1.C: Interactive Web UI
### 5. `compi_phase1c_streamlit_ui.py` - Streamlit Web Interface
**Features:**
- Complete web-based interface for text-to-image generation
- Interactive style and mood selection with custom options
- Advanced settings (steps, guidance, dimensions, negative prompts)
- Real-time image generation and display
- Progress tracking and generation logs
- Automatic saving with comprehensive metadata
**Usage:**
```bash
python run_ui.py
# Or directly: streamlit run src/ui/compi_phase1c_streamlit_ui.py
```
### 6. `compi_phase1c_gradio_ui.py` - Gradio Web Interface
**Features:**
- Alternative web interface with Gradio framework
- Gallery view for multiple image variations
- Collapsible advanced settings
- Real-time generation logs
- Mobile-friendly responsive design
**Usage:**
```bash
python run_gradio_ui.py
# Or directly: python src/ui/compi_phase1c_gradio_ui.py
```
## π Phase 1.D: Quality Evaluation Tools
### 7. `compi_phase1d_evaluate_quality.py` - Comprehensive Evaluation Interface
**Features:**
- Systematic image quality assessment with 5-criteria scoring system
- Interactive Streamlit web interface for detailed evaluation
- Objective metrics calculation (perceptual hashes, dimensions, file size)
- Batch evaluation capabilities for efficient processing
- Comprehensive logging and CSV export for trend analysis
- Summary analytics with performance insights and recommendations
**Usage:**
```bash
python run_evaluation.py
# Or directly: streamlit run src/generators/compi_phase1d_evaluate_quality.py
```
### 8. `compi_phase1d_cli_evaluation.py` - Command-Line Evaluation Tools
**Features:**
- Batch evaluation and analysis from command line
- Statistical summaries and performance reports
- Filtering by style, mood, and evaluation status
- Automated scoring for large image sets
- Detailed report generation with recommendations
**Command Line Options:**
```bash
python src/generators/compi_phase1d_cli_evaluation.py [OPTIONS]
Options:
--analyze Display evaluation summary and statistics
--report Generate detailed evaluation report
--batch-score P S M Q A Batch score images (1-5 for each criteria)
--list-all List all images with evaluation status
--list-evaluated List only evaluated images
--list-unevaluated List only unevaluated images
--style TEXT Filter by style
--mood TEXT Filter by mood
--notes TEXT Notes for batch evaluation
--output FILE Output file for reports
```
## π¨ Phase 1.E: Personal Style Fine-tuning (LoRA)
### 9. `compi_phase1e_dataset_prep.py` - Dataset Preparation for LoRA Training
**Features:**
- Organize and validate personal style images for training
- Generate appropriate training captions with trigger words
- Resize and format images for optimal LoRA training
- Create train/validation splits with metadata tracking
- Support for multiple image formats and quality validation
**Usage:**
```bash
python src/generators/compi_phase1e_dataset_prep.py --input-dir my_artwork --style-name "my_art_style"
# Or via wrapper: python run_dataset_prep.py --input-dir my_artwork --style-name "my_art_style"
```
### 10. `compi_phase1e_lora_training.py` - LoRA Fine-tuning Engine
**Features:**
- Full LoRA (Low-Rank Adaptation) fine-tuning pipeline
- Memory-efficient training with gradient checkpointing
- Configurable LoRA parameters (rank, alpha, learning rate)
- Automatic checkpoint saving and validation monitoring
- Integration with PEFT library for optimal performance
**Command Line Options:**
```bash
python run_lora_training.py [OPTIONS] --dataset-dir DATASET_DIR
Options:
--dataset-dir DIR Required: Prepared dataset directory
--epochs INT Number of training epochs (default: 100)
--learning-rate FLOAT Learning rate (default: 1e-4)
--lora-rank INT LoRA rank (default: 4)
--lora-alpha INT LoRA alpha (default: 32)
--batch-size INT Training batch size (default: 1)
--save-steps INT Save checkpoint every N steps
--gradient-checkpointing Enable gradient checkpointing for memory efficiency
--mixed-precision Use mixed precision training
```
### 11. `compi_phase1e_style_generation.py` - Personal Style Generation
**Features:**
- Generate images using trained LoRA personal styles
- Adjustable style strength and generation parameters
- Interactive and batch generation modes
- Integration with existing CompI pipeline and metadata
- Support for multiple LoRA styles and model switching
**Usage:**
```bash
python run_style_generation.py --lora-path lora_models/my_style/checkpoint-1000 "a cat in my_style"
# Or directly: python src/generators/compi_phase1e_style_generation.py --lora-path PATH PROMPT
```
### 12. `compi_phase1e_style_manager.py` - LoRA Style Management
**Features:**
- Manage multiple trained LoRA styles and checkpoints
- Cleanup old checkpoints and organize model storage
- Export style information and training analytics
- Style database with automatic scanning and metadata
- Batch operations for style maintenance and organization
**Command Line Options:**
```bash
python src/generators/compi_phase1e_style_manager.py [OPTIONS]
Options:
--list List all available LoRA styles
--info STYLE_NAME Show detailed information about a style
--refresh Refresh the styles database
--cleanup STYLE_NAME Clean up old checkpoints for a style
--export OUTPUT_FILE Export styles information to CSV
--delete STYLE_NAME Delete a LoRA style (requires --confirm)
```
### Web UI Examples
**Streamlit Interface:**
- Navigate to http://localhost:8501 after running
- Full-featured interface with sidebar settings
- Progress bars and status updates
- Expandable sections for details
**Gradio Interface:**
- Navigate to http://localhost:7860 after running
- Gallery-style image display
- Compact, mobile-friendly design
- Real-time generation feedback
## π― Next Steps
Phase 1 establishes the foundation for CompI's text-to-image capabilities. Future phases will add:
- Audio input processing
- Emotion and style conditioning
- Real-time data integration
- Multimodal fusion
- Advanced UI interfaces
## π Resources
- [Stable Diffusion Documentation](https://huggingface.co/docs/diffusers)
- [Prompt Engineering Guide](https://prompthero.com/stable-diffusion-prompt-guide)
- [CompI Development Plan](development.md)
|