A newer version of the Streamlit SDK is available:
1.50.0
CompI Phase 1: Text-to-Image Generation Usage Guide
This guide covers the Phase 1 implementation of CompI's text-to-image generation capabilities using Stable Diffusion.
π Quick Start
Basic Usage
# Simple generation with interactive prompt
python run_basic_generation.py
# Generate from command line
python run_basic_generation.py "A magical forest, digital art, highly detailed"
# Or run directly from src/generators/
python src/generators/compi_phase1_text2image.py "A magical forest"
Advanced Usage
# Advanced script with more options
python run_advanced_generation.py "cyberpunk city at sunset" --negative "blurry, low quality" --steps 50 --batch 3
# Interactive mode for experimentation
python run_advanced_generation.py --interactive
# Or run directly from src/generators/
python src/generators/compi_phase1_advanced.py --interactive
π Available Scripts
1. compi_phase1_text2image.py
- Basic Implementation
Features:
- Simple, standalone text-to-image generation
- Automatic GPU/CPU detection
- Command line or interactive prompts
- Automatic output saving with descriptive filenames
- Comprehensive logging
Usage:
python compi_phase1_text2image.py [prompt]
2. compi_phase1_advanced.py
- Enhanced Implementation
Features:
- Batch generation (multiple images)
- Negative prompts (what to avoid)
- Customizable parameters (steps, guidance, dimensions)
- Interactive mode for experimentation
- Metadata saving (JSON files with generation parameters)
- Multiple model support
Command Line Options:
python compi_phase1_advanced.py [OPTIONS] [PROMPT]
Options:
--negative, -n TEXT Negative prompt (what to avoid)
--steps, -s INTEGER Number of inference steps (default: 30)
--guidance, -g FLOAT Guidance scale (default: 7.5)
--seed INTEGER Random seed for reproducibility
--batch, -b INTEGER Number of images to generate
--width, -w INTEGER Image width (default: 512)
--height INTEGER Image height (default: 512)
--model, -m TEXT Model to use (default: runwayml/stable-diffusion-v1-5)
--output, -o TEXT Output directory (default: outputs)
--interactive, -i Interactive mode
π¨ Example Commands
Basic Examples
# Simple landscape
python run_basic_generation.py "serene mountain lake, golden hour, photorealistic"
# Digital art style
python run_basic_generation.py "futuristic robot, neon lights, cyberpunk style, digital art"
Advanced Examples
# High-quality generation with negative prompts
python run_advanced_generation.py "beautiful portrait of a woman, oil painting style" \
--negative "blurry, distorted, low quality, bad anatomy" \
--steps 50 --guidance 8.0
# Batch generation with fixed seed
python run_advanced_generation.py "abstract geometric patterns, colorful" \
--batch 5 --seed 12345 --steps 40
# Custom dimensions for landscape
python run_advanced_generation.py "panoramic view of alien landscape" \
--width 768 --height 512 --steps 35
# Interactive experimentation
python run_advanced_generation.py --interactive
π Output Structure
Generated images are saved in the outputs/
directory with descriptive filenames:
outputs/
βββ magical_forest_digital_art_20241225_143022_seed42.png
βββ magical_forest_digital_art_20241225_143022_seed42_metadata.json
βββ cyberpunk_city_sunset_20241225_143156_seed1337.png
βββ cyberpunk_city_sunset_20241225_143156_seed1337_metadata.json
Metadata Files
Each generated image (in advanced mode) includes a JSON metadata file with:
- Original prompt and negative prompt
- Generation parameters (steps, guidance, seed)
- Image dimensions and model used
- Timestamp and batch information
βοΈ Configuration Tips
For Best Quality
- Use 30-50 inference steps
- Guidance scale 7.5-12.0
- Include style descriptors ("digital art", "oil painting", "photorealistic")
- Use negative prompts to avoid unwanted elements
For Speed
- Use 20-25 inference steps
- Lower guidance scale (6.0-7.5)
- Stick to 512x512 resolution
For Experimentation
- Use interactive mode
- Try different seeds with the same prompt
- Experiment with guidance scale values
- Use batch generation to explore variations
π§ Troubleshooting
Common Issues
- CUDA out of memory: Reduce batch size or image dimensions
- Slow generation: Ensure CUDA is available and working
- Poor quality: Increase steps, adjust guidance scale, improve prompts
- Model download fails: Check internet connection, try again
Performance Optimization
- The scripts automatically enable attention slicing for memory efficiency
- GPU detection is automatic
- Models are cached after first download
π¨ Phase 1.B: Style Conditioning & Prompt Engineering
3. compi_phase1b_styled_generation.py
- Style Conditioning
Features:
- Interactive style and mood selection from curated lists
- Intelligent prompt engineering and combination
- Multiple variations with unique seeds
- Comprehensive logging and filename organization
Usage:
python run_styled_generation.py [prompt]
# Or directly: python src/generators/compi_phase1b_styled_generation.py [prompt]
4. compi_phase1b_advanced_styling.py
- Advanced Style Control
Features:
- 13 predefined art styles with optimized prompts and negative prompts
- 9 mood categories with atmospheric conditioning
- Quality presets (draft/standard/high)
- Command line and interactive modes
- Comprehensive metadata saving
Command Line Options:
python run_advanced_styling.py [OPTIONS] [PROMPT]
# Or directly: python src/generators/compi_phase1b_advanced_styling.py [OPTIONS] [PROMPT]
Options:
--style, -s TEXT Art style (or number from list)
--mood, -m TEXT Mood/atmosphere (or number from list)
--variations, -v INT Number of variations (default: 1)
--quality, -q CHOICE Quality preset [draft/standard/high]
--negative, -n TEXT Negative prompt
--interactive, -i Interactive mode
--list-styles List available styles and exit
--list-moods List available moods and exit
Style Conditioning Examples
Basic Style Selection:
# Interactive mode with guided selection
python run_styled_generation.py
# Command line with style selection
python run_advanced_styling.py "mountain landscape" --style cyberpunk --mood dramatic
Advanced Style Control:
# High quality with multiple variations
python run_advanced_styling.py "portrait of a wizard" \
--style "oil painting" --mood "mysterious" \
--quality high --variations 3 \
--negative "blurry, distorted, amateur"
# List available options
python run_advanced_styling.py --list-styles
python run_advanced_styling.py --list-moods
Available Styles:
- digital art, oil painting, watercolor, cyberpunk
- impressionist, concept art, anime, photorealistic
- minimalist, surrealism, pixel art, steampunk, 3d render
Available Moods:
- dreamy, dark, peaceful, vibrant, melancholic
- mysterious, whimsical, dramatic, retro
π₯οΈ Phase 1.C: Interactive Web UI
5. compi_phase1c_streamlit_ui.py
- Streamlit Web Interface
Features:
- Complete web-based interface for text-to-image generation
- Interactive style and mood selection with custom options
- Advanced settings (steps, guidance, dimensions, negative prompts)
- Real-time image generation and display
- Progress tracking and generation logs
- Automatic saving with comprehensive metadata
Usage:
python run_ui.py
# Or directly: streamlit run src/ui/compi_phase1c_streamlit_ui.py
6. compi_phase1c_gradio_ui.py
- Gradio Web Interface
Features:
- Alternative web interface with Gradio framework
- Gallery view for multiple image variations
- Collapsible advanced settings
- Real-time generation logs
- Mobile-friendly responsive design
Usage:
python run_gradio_ui.py
# Or directly: python src/ui/compi_phase1c_gradio_ui.py
π Phase 1.D: Quality Evaluation Tools
7. compi_phase1d_evaluate_quality.py
- Comprehensive Evaluation Interface
Features:
- Systematic image quality assessment with 5-criteria scoring system
- Interactive Streamlit web interface for detailed evaluation
- Objective metrics calculation (perceptual hashes, dimensions, file size)
- Batch evaluation capabilities for efficient processing
- Comprehensive logging and CSV export for trend analysis
- Summary analytics with performance insights and recommendations
Usage:
python run_evaluation.py
# Or directly: streamlit run src/generators/compi_phase1d_evaluate_quality.py
8. compi_phase1d_cli_evaluation.py
- Command-Line Evaluation Tools
Features:
- Batch evaluation and analysis from command line
- Statistical summaries and performance reports
- Filtering by style, mood, and evaluation status
- Automated scoring for large image sets
- Detailed report generation with recommendations
Command Line Options:
python src/generators/compi_phase1d_cli_evaluation.py [OPTIONS]
Options:
--analyze Display evaluation summary and statistics
--report Generate detailed evaluation report
--batch-score P S M Q A Batch score images (1-5 for each criteria)
--list-all List all images with evaluation status
--list-evaluated List only evaluated images
--list-unevaluated List only unevaluated images
--style TEXT Filter by style
--mood TEXT Filter by mood
--notes TEXT Notes for batch evaluation
--output FILE Output file for reports
π¨ Phase 1.E: Personal Style Fine-tuning (LoRA)
9. compi_phase1e_dataset_prep.py
- Dataset Preparation for LoRA Training
Features:
- Organize and validate personal style images for training
- Generate appropriate training captions with trigger words
- Resize and format images for optimal LoRA training
- Create train/validation splits with metadata tracking
- Support for multiple image formats and quality validation
Usage:
python src/generators/compi_phase1e_dataset_prep.py --input-dir my_artwork --style-name "my_art_style"
# Or via wrapper: python run_dataset_prep.py --input-dir my_artwork --style-name "my_art_style"
10. compi_phase1e_lora_training.py
- LoRA Fine-tuning Engine
Features:
- Full LoRA (Low-Rank Adaptation) fine-tuning pipeline
- Memory-efficient training with gradient checkpointing
- Configurable LoRA parameters (rank, alpha, learning rate)
- Automatic checkpoint saving and validation monitoring
- Integration with PEFT library for optimal performance
Command Line Options:
python run_lora_training.py [OPTIONS] --dataset-dir DATASET_DIR
Options:
--dataset-dir DIR Required: Prepared dataset directory
--epochs INT Number of training epochs (default: 100)
--learning-rate FLOAT Learning rate (default: 1e-4)
--lora-rank INT LoRA rank (default: 4)
--lora-alpha INT LoRA alpha (default: 32)
--batch-size INT Training batch size (default: 1)
--save-steps INT Save checkpoint every N steps
--gradient-checkpointing Enable gradient checkpointing for memory efficiency
--mixed-precision Use mixed precision training
11. compi_phase1e_style_generation.py
- Personal Style Generation
Features:
- Generate images using trained LoRA personal styles
- Adjustable style strength and generation parameters
- Interactive and batch generation modes
- Integration with existing CompI pipeline and metadata
- Support for multiple LoRA styles and model switching
Usage:
python run_style_generation.py --lora-path lora_models/my_style/checkpoint-1000 "a cat in my_style"
# Or directly: python src/generators/compi_phase1e_style_generation.py --lora-path PATH PROMPT
12. compi_phase1e_style_manager.py
- LoRA Style Management
Features:
- Manage multiple trained LoRA styles and checkpoints
- Cleanup old checkpoints and organize model storage
- Export style information and training analytics
- Style database with automatic scanning and metadata
- Batch operations for style maintenance and organization
Command Line Options:
python src/generators/compi_phase1e_style_manager.py [OPTIONS]
Options:
--list List all available LoRA styles
--info STYLE_NAME Show detailed information about a style
--refresh Refresh the styles database
--cleanup STYLE_NAME Clean up old checkpoints for a style
--export OUTPUT_FILE Export styles information to CSV
--delete STYLE_NAME Delete a LoRA style (requires --confirm)
Web UI Examples
Streamlit Interface:
- Navigate to http://localhost:8501 after running
- Full-featured interface with sidebar settings
- Progress bars and status updates
- Expandable sections for details
Gradio Interface:
- Navigate to http://localhost:7860 after running
- Gallery-style image display
- Compact, mobile-friendly design
- Real-time generation feedback
π― Next Steps
Phase 1 establishes the foundation for CompI's text-to-image capabilities. Future phases will add:
- Audio input processing
- Emotion and style conditioning
- Real-time data integration
- Multimodal fusion
- Advanced UI interfaces