Spaces:

axrzce
/

Comp-I

Running

File size: 13,839 Bytes

338d95d

# CompI Phase 1: Text-to-Image Generation Usage Guide

This guide covers the Phase 1 implementation of CompI's text-to-image generation capabilities using Stable Diffusion.

## 🚀 Quick Start

### Basic Usage

```bash
# Simple generation with interactive prompt
python run_basic_generation.py

# Generate from command line
python run_basic_generation.py "A magical forest, digital art, highly detailed"

# Or run directly from src/generators/
python src/generators/compi_phase1_text2image.py "A magical forest"
```

### Advanced Usage

```bash
# Advanced script with more options
python run_advanced_generation.py "cyberpunk city at sunset" --negative "blurry, low quality" --steps 50 --batch 3

# Interactive mode for experimentation
python run_advanced_generation.py --interactive

# Or run directly from src/generators/
python src/generators/compi_phase1_advanced.py --interactive
```

## 📋 Available Scripts

### 1. `compi_phase1_text2image.py` - Basic Implementation

**Features:**

- Simple, standalone text-to-image generation
- Automatic GPU/CPU detection
- Command line or interactive prompts
- Automatic output saving with descriptive filenames
- Comprehensive logging

**Usage:**

```bash
python compi_phase1_text2image.py [prompt]
```

### 2. `compi_phase1_advanced.py` - Enhanced Implementation

**Features:**

- Batch generation (multiple images)
- Negative prompts (what to avoid)
- Customizable parameters (steps, guidance, dimensions)
- Interactive mode for experimentation
- Metadata saving (JSON files with generation parameters)
- Multiple model support

**Command Line Options:**

```bash
python compi_phase1_advanced.py [OPTIONS] [PROMPT]

Options:
  --negative, -n TEXT     Negative prompt (what to avoid)
  --steps, -s INTEGER     Number of inference steps (default: 30)
  --guidance, -g FLOAT    Guidance scale (default: 7.5)
  --seed INTEGER          Random seed for reproducibility
  --batch, -b INTEGER     Number of images to generate
  --width, -w INTEGER     Image width (default: 512)
  --height INTEGER        Image height (default: 512)
  --model, -m TEXT        Model to use (default: runwayml/stable-diffusion-v1-5)
  --output, -o TEXT       Output directory (default: outputs)
  --interactive, -i       Interactive mode
```

## 🎨 Example Commands

### Basic Examples

```bash
# Simple landscape
python run_basic_generation.py "serene mountain lake, golden hour, photorealistic"

# Digital art style
python run_basic_generation.py "futuristic robot, neon lights, cyberpunk style, digital art"
```

### Advanced Examples

```bash
# High-quality generation with negative prompts
python run_advanced_generation.py "beautiful portrait of a woman, oil painting style" \
  --negative "blurry, distorted, low quality, bad anatomy" \
  --steps 50 --guidance 8.0

# Batch generation with fixed seed
python run_advanced_generation.py "abstract geometric patterns, colorful" \
  --batch 5 --seed 12345 --steps 40

# Custom dimensions for landscape
python run_advanced_generation.py "panoramic view of alien landscape" \
  --width 768 --height 512 --steps 35

# Interactive experimentation
python run_advanced_generation.py --interactive
```

## 📁 Output Structure

Generated images are saved in the `outputs/` directory with descriptive filenames:

```
outputs/
├── magical_forest_digital_art_20241225_143022_seed42.png
├── magical_forest_digital_art_20241225_143022_seed42_metadata.json
├── cyberpunk_city_sunset_20241225_143156_seed1337.png
└── cyberpunk_city_sunset_20241225_143156_seed1337_metadata.json
```

### Metadata Files

Each generated image (in advanced mode) includes a JSON metadata file with:

- Original prompt and negative prompt
- Generation parameters (steps, guidance, seed)
- Image dimensions and model used
- Timestamp and batch information

## ⚙️ Configuration Tips

### For Best Quality

- Use 30-50 inference steps
- Guidance scale 7.5-12.0
- Include style descriptors ("digital art", "oil painting", "photorealistic")
- Use negative prompts to avoid unwanted elements

### For Speed

- Use 20-25 inference steps
- Lower guidance scale (6.0-7.5)
- Stick to 512x512 resolution

### For Experimentation

- Use interactive mode
- Try different seeds with the same prompt
- Experiment with guidance scale values
- Use batch generation to explore variations

## 🔧 Troubleshooting

### Common Issues

1. **CUDA out of memory**: Reduce batch size or image dimensions
2. **Slow generation**: Ensure CUDA is available and working
3. **Poor quality**: Increase steps, adjust guidance scale, improve prompts
4. **Model download fails**: Check internet connection, try again

### Performance Optimization

- The scripts automatically enable attention slicing for memory efficiency
- GPU detection is automatic
- Models are cached after first download

## 🎨 Phase 1.B: Style Conditioning & Prompt Engineering

### 3. `compi_phase1b_styled_generation.py` - Style Conditioning

**Features:**

- Interactive style and mood selection from curated lists
- Intelligent prompt engineering and combination
- Multiple variations with unique seeds
- Comprehensive logging and filename organization

**Usage:**

```bash
python run_styled_generation.py [prompt]
# Or directly: python src/generators/compi_phase1b_styled_generation.py [prompt]
```

### 4. `compi_phase1b_advanced_styling.py` - Advanced Style Control

**Features:**

- 13 predefined art styles with optimized prompts and negative prompts
- 9 mood categories with atmospheric conditioning
- Quality presets (draft/standard/high)
- Command line and interactive modes
- Comprehensive metadata saving

**Command Line Options:**

```bash
python run_advanced_styling.py [OPTIONS] [PROMPT]
# Or directly: python src/generators/compi_phase1b_advanced_styling.py [OPTIONS] [PROMPT]

Options:
  --style, -s TEXT        Art style (or number from list)
  --mood, -m TEXT         Mood/atmosphere (or number from list)
  --variations, -v INT    Number of variations (default: 1)
  --quality, -q CHOICE    Quality preset [draft/standard/high]
  --negative, -n TEXT     Negative prompt
  --interactive, -i       Interactive mode
  --list-styles          List available styles and exit
  --list-moods           List available moods and exit
```

### Style Conditioning Examples

**Basic Style Selection:**

```bash
# Interactive mode with guided selection
python run_styled_generation.py

# Command line with style selection
python run_advanced_styling.py "mountain landscape" --style cyberpunk --mood dramatic
```

**Advanced Style Control:**

```bash
# High quality with multiple variations
python run_advanced_styling.py "portrait of a wizard" \
  --style "oil painting" --mood "mysterious" \
  --quality high --variations 3 \
  --negative "blurry, distorted, amateur"

# List available options
python run_advanced_styling.py --list-styles
python run_advanced_styling.py --list-moods
```

**Available Styles:**

- digital art, oil painting, watercolor, cyberpunk
- impressionist, concept art, anime, photorealistic
- minimalist, surrealism, pixel art, steampunk, 3d render

**Available Moods:**

- dreamy, dark, peaceful, vibrant, melancholic
- mysterious, whimsical, dramatic, retro

## 🖥️ Phase 1.C: Interactive Web UI

### 5. `compi_phase1c_streamlit_ui.py` - Streamlit Web Interface

**Features:**

- Complete web-based interface for text-to-image generation
- Interactive style and mood selection with custom options
- Advanced settings (steps, guidance, dimensions, negative prompts)
- Real-time image generation and display
- Progress tracking and generation logs
- Automatic saving with comprehensive metadata

**Usage:**

```bash
python run_ui.py
# Or directly: streamlit run src/ui/compi_phase1c_streamlit_ui.py
```

### 6. `compi_phase1c_gradio_ui.py` - Gradio Web Interface

**Features:**

- Alternative web interface with Gradio framework
- Gallery view for multiple image variations
- Collapsible advanced settings
- Real-time generation logs
- Mobile-friendly responsive design

**Usage:**

```bash
python run_gradio_ui.py
# Or directly: python src/ui/compi_phase1c_gradio_ui.py
```

## 📊 Phase 1.D: Quality Evaluation Tools

### 7. `compi_phase1d_evaluate_quality.py` - Comprehensive Evaluation Interface

**Features:**

- Systematic image quality assessment with 5-criteria scoring system
- Interactive Streamlit web interface for detailed evaluation
- Objective metrics calculation (perceptual hashes, dimensions, file size)
- Batch evaluation capabilities for efficient processing
- Comprehensive logging and CSV export for trend analysis
- Summary analytics with performance insights and recommendations

**Usage:**

```bash
python run_evaluation.py
# Or directly: streamlit run src/generators/compi_phase1d_evaluate_quality.py
```

### 8. `compi_phase1d_cli_evaluation.py` - Command-Line Evaluation Tools

**Features:**

- Batch evaluation and analysis from command line
- Statistical summaries and performance reports
- Filtering by style, mood, and evaluation status
- Automated scoring for large image sets
- Detailed report generation with recommendations

**Command Line Options:**

```bash
python src/generators/compi_phase1d_cli_evaluation.py [OPTIONS]

Options:
  --analyze                    Display evaluation summary and statistics
  --report                     Generate detailed evaluation report
  --batch-score P S M Q A      Batch score images (1-5 for each criteria)
  --list-all                   List all images with evaluation status
  --list-evaluated             List only evaluated images
  --list-unevaluated          List only unevaluated images
  --style TEXT                 Filter by style
  --mood TEXT                  Filter by mood
  --notes TEXT                 Notes for batch evaluation
  --output FILE                Output file for reports
```

## 🎨 Phase 1.E: Personal Style Fine-tuning (LoRA)

### 9. `compi_phase1e_dataset_prep.py` - Dataset Preparation for LoRA Training

**Features:**

- Organize and validate personal style images for training
- Generate appropriate training captions with trigger words
- Resize and format images for optimal LoRA training
- Create train/validation splits with metadata tracking
- Support for multiple image formats and quality validation

**Usage:**

```bash
python src/generators/compi_phase1e_dataset_prep.py --input-dir my_artwork --style-name "my_art_style"
# Or via wrapper: python run_dataset_prep.py --input-dir my_artwork --style-name "my_art_style"
```

### 10. `compi_phase1e_lora_training.py` - LoRA Fine-tuning Engine

**Features:**

- Full LoRA (Low-Rank Adaptation) fine-tuning pipeline
- Memory-efficient training with gradient checkpointing
- Configurable LoRA parameters (rank, alpha, learning rate)
- Automatic checkpoint saving and validation monitoring
- Integration with PEFT library for optimal performance

**Command Line Options:**

```bash
python run_lora_training.py [OPTIONS] --dataset-dir DATASET_DIR

Options:
  --dataset-dir DIR            Required: Prepared dataset directory
  --epochs INT                 Number of training epochs (default: 100)
  --learning-rate FLOAT        Learning rate (default: 1e-4)
  --lora-rank INT              LoRA rank (default: 4)
  --lora-alpha INT             LoRA alpha (default: 32)
  --batch-size INT             Training batch size (default: 1)
  --save-steps INT             Save checkpoint every N steps
  --gradient-checkpointing     Enable gradient checkpointing for memory efficiency
  --mixed-precision            Use mixed precision training
```

### 11. `compi_phase1e_style_generation.py` - Personal Style Generation

**Features:**

- Generate images using trained LoRA personal styles
- Adjustable style strength and generation parameters
- Interactive and batch generation modes
- Integration with existing CompI pipeline and metadata
- Support for multiple LoRA styles and model switching

**Usage:**

```bash
python run_style_generation.py --lora-path lora_models/my_style/checkpoint-1000 "a cat in my_style"
# Or directly: python src/generators/compi_phase1e_style_generation.py --lora-path PATH PROMPT
```

### 12. `compi_phase1e_style_manager.py` - LoRA Style Management

**Features:**

- Manage multiple trained LoRA styles and checkpoints
- Cleanup old checkpoints and organize model storage
- Export style information and training analytics
- Style database with automatic scanning and metadata
- Batch operations for style maintenance and organization

**Command Line Options:**

```bash
python src/generators/compi_phase1e_style_manager.py [OPTIONS]

Options:
  --list                       List all available LoRA styles
  --info STYLE_NAME           Show detailed information about a style
  --refresh                    Refresh the styles database
  --cleanup STYLE_NAME         Clean up old checkpoints for a style
  --export OUTPUT_FILE         Export styles information to CSV
  --delete STYLE_NAME          Delete a LoRA style (requires --confirm)
```

### Web UI Examples

**Streamlit Interface:**

- Navigate to http://localhost:8501 after running
- Full-featured interface with sidebar settings
- Progress bars and status updates
- Expandable sections for details

**Gradio Interface:**

- Navigate to http://localhost:7860 after running
- Gallery-style image display
- Compact, mobile-friendly design
- Real-time generation feedback

## 🎯 Next Steps

Phase 1 establishes the foundation for CompI's text-to-image capabilities. Future phases will add:

- Audio input processing
- Emotion and style conditioning
- Real-time data integration
- Multimodal fusion
- Advanced UI interfaces

## 📚 Resources

- [Stable Diffusion Documentation](https://huggingface.co/docs/diffusers)
- [Prompt Engineering Guide](https://prompthero.com/stable-diffusion-prompt-guide)
- [CompI Development Plan](development.md)