# CompI Phase 1: Text-to-Image Generation Usage Guide This guide covers the Phase 1 implementation of CompI's text-to-image generation capabilities using Stable Diffusion. ## 🚀 Quick Start ### Basic Usage ```bash # Simple generation with interactive prompt python run_basic_generation.py # Generate from command line python run_basic_generation.py "A magical forest, digital art, highly detailed" # Or run directly from src/generators/ python src/generators/compi_phase1_text2image.py "A magical forest" ``` ### Advanced Usage ```bash # Advanced script with more options python run_advanced_generation.py "cyberpunk city at sunset" --negative "blurry, low quality" --steps 50 --batch 3 # Interactive mode for experimentation python run_advanced_generation.py --interactive # Or run directly from src/generators/ python src/generators/compi_phase1_advanced.py --interactive ``` ## 📋 Available Scripts ### 1. `compi_phase1_text2image.py` - Basic Implementation **Features:** - Simple, standalone text-to-image generation - Automatic GPU/CPU detection - Command line or interactive prompts - Automatic output saving with descriptive filenames - Comprehensive logging **Usage:** ```bash python compi_phase1_text2image.py [prompt] ``` ### 2. `compi_phase1_advanced.py` - Enhanced Implementation **Features:** - Batch generation (multiple images) - Negative prompts (what to avoid) - Customizable parameters (steps, guidance, dimensions) - Interactive mode for experimentation - Metadata saving (JSON files with generation parameters) - Multiple model support **Command Line Options:** ```bash python compi_phase1_advanced.py [OPTIONS] [PROMPT] Options: --negative, -n TEXT Negative prompt (what to avoid) --steps, -s INTEGER Number of inference steps (default: 30) --guidance, -g FLOAT Guidance scale (default: 7.5) --seed INTEGER Random seed for reproducibility --batch, -b INTEGER Number of images to generate --width, -w INTEGER Image width (default: 512) --height INTEGER Image height (default: 512) --model, -m TEXT Model to use (default: runwayml/stable-diffusion-v1-5) --output, -o TEXT Output directory (default: outputs) --interactive, -i Interactive mode ``` ## 🎨 Example Commands ### Basic Examples ```bash # Simple landscape python run_basic_generation.py "serene mountain lake, golden hour, photorealistic" # Digital art style python run_basic_generation.py "futuristic robot, neon lights, cyberpunk style, digital art" ``` ### Advanced Examples ```bash # High-quality generation with negative prompts python run_advanced_generation.py "beautiful portrait of a woman, oil painting style" \ --negative "blurry, distorted, low quality, bad anatomy" \ --steps 50 --guidance 8.0 # Batch generation with fixed seed python run_advanced_generation.py "abstract geometric patterns, colorful" \ --batch 5 --seed 12345 --steps 40 # Custom dimensions for landscape python run_advanced_generation.py "panoramic view of alien landscape" \ --width 768 --height 512 --steps 35 # Interactive experimentation python run_advanced_generation.py --interactive ``` ## 📁 Output Structure Generated images are saved in the `outputs/` directory with descriptive filenames: ``` outputs/ ├── magical_forest_digital_art_20241225_143022_seed42.png ├── magical_forest_digital_art_20241225_143022_seed42_metadata.json ├── cyberpunk_city_sunset_20241225_143156_seed1337.png └── cyberpunk_city_sunset_20241225_143156_seed1337_metadata.json ``` ### Metadata Files Each generated image (in advanced mode) includes a JSON metadata file with: - Original prompt and negative prompt - Generation parameters (steps, guidance, seed) - Image dimensions and model used - Timestamp and batch information ## ⚙️ Configuration Tips ### For Best Quality - Use 30-50 inference steps - Guidance scale 7.5-12.0 - Include style descriptors ("digital art", "oil painting", "photorealistic") - Use negative prompts to avoid unwanted elements ### For Speed - Use 20-25 inference steps - Lower guidance scale (6.0-7.5) - Stick to 512x512 resolution ### For Experimentation - Use interactive mode - Try different seeds with the same prompt - Experiment with guidance scale values - Use batch generation to explore variations ## 🔧 Troubleshooting ### Common Issues 1. **CUDA out of memory**: Reduce batch size or image dimensions 2. **Slow generation**: Ensure CUDA is available and working 3. **Poor quality**: Increase steps, adjust guidance scale, improve prompts 4. **Model download fails**: Check internet connection, try again ### Performance Optimization - The scripts automatically enable attention slicing for memory efficiency - GPU detection is automatic - Models are cached after first download ## 🎨 Phase 1.B: Style Conditioning & Prompt Engineering ### 3. `compi_phase1b_styled_generation.py` - Style Conditioning **Features:** - Interactive style and mood selection from curated lists - Intelligent prompt engineering and combination - Multiple variations with unique seeds - Comprehensive logging and filename organization **Usage:** ```bash python run_styled_generation.py [prompt] # Or directly: python src/generators/compi_phase1b_styled_generation.py [prompt] ``` ### 4. `compi_phase1b_advanced_styling.py` - Advanced Style Control **Features:** - 13 predefined art styles with optimized prompts and negative prompts - 9 mood categories with atmospheric conditioning - Quality presets (draft/standard/high) - Command line and interactive modes - Comprehensive metadata saving **Command Line Options:** ```bash python run_advanced_styling.py [OPTIONS] [PROMPT] # Or directly: python src/generators/compi_phase1b_advanced_styling.py [OPTIONS] [PROMPT] Options: --style, -s TEXT Art style (or number from list) --mood, -m TEXT Mood/atmosphere (or number from list) --variations, -v INT Number of variations (default: 1) --quality, -q CHOICE Quality preset [draft/standard/high] --negative, -n TEXT Negative prompt --interactive, -i Interactive mode --list-styles List available styles and exit --list-moods List available moods and exit ``` ### Style Conditioning Examples **Basic Style Selection:** ```bash # Interactive mode with guided selection python run_styled_generation.py # Command line with style selection python run_advanced_styling.py "mountain landscape" --style cyberpunk --mood dramatic ``` **Advanced Style Control:** ```bash # High quality with multiple variations python run_advanced_styling.py "portrait of a wizard" \ --style "oil painting" --mood "mysterious" \ --quality high --variations 3 \ --negative "blurry, distorted, amateur" # List available options python run_advanced_styling.py --list-styles python run_advanced_styling.py --list-moods ``` **Available Styles:** - digital art, oil painting, watercolor, cyberpunk - impressionist, concept art, anime, photorealistic - minimalist, surrealism, pixel art, steampunk, 3d render **Available Moods:** - dreamy, dark, peaceful, vibrant, melancholic - mysterious, whimsical, dramatic, retro ## 🖥️ Phase 1.C: Interactive Web UI ### 5. `compi_phase1c_streamlit_ui.py` - Streamlit Web Interface **Features:** - Complete web-based interface for text-to-image generation - Interactive style and mood selection with custom options - Advanced settings (steps, guidance, dimensions, negative prompts) - Real-time image generation and display - Progress tracking and generation logs - Automatic saving with comprehensive metadata **Usage:** ```bash python run_ui.py # Or directly: streamlit run src/ui/compi_phase1c_streamlit_ui.py ``` ### 6. `compi_phase1c_gradio_ui.py` - Gradio Web Interface **Features:** - Alternative web interface with Gradio framework - Gallery view for multiple image variations - Collapsible advanced settings - Real-time generation logs - Mobile-friendly responsive design **Usage:** ```bash python run_gradio_ui.py # Or directly: python src/ui/compi_phase1c_gradio_ui.py ``` ## 📊 Phase 1.D: Quality Evaluation Tools ### 7. `compi_phase1d_evaluate_quality.py` - Comprehensive Evaluation Interface **Features:** - Systematic image quality assessment with 5-criteria scoring system - Interactive Streamlit web interface for detailed evaluation - Objective metrics calculation (perceptual hashes, dimensions, file size) - Batch evaluation capabilities for efficient processing - Comprehensive logging and CSV export for trend analysis - Summary analytics with performance insights and recommendations **Usage:** ```bash python run_evaluation.py # Or directly: streamlit run src/generators/compi_phase1d_evaluate_quality.py ``` ### 8. `compi_phase1d_cli_evaluation.py` - Command-Line Evaluation Tools **Features:** - Batch evaluation and analysis from command line - Statistical summaries and performance reports - Filtering by style, mood, and evaluation status - Automated scoring for large image sets - Detailed report generation with recommendations **Command Line Options:** ```bash python src/generators/compi_phase1d_cli_evaluation.py [OPTIONS] Options: --analyze Display evaluation summary and statistics --report Generate detailed evaluation report --batch-score P S M Q A Batch score images (1-5 for each criteria) --list-all List all images with evaluation status --list-evaluated List only evaluated images --list-unevaluated List only unevaluated images --style TEXT Filter by style --mood TEXT Filter by mood --notes TEXT Notes for batch evaluation --output FILE Output file for reports ``` ## 🎨 Phase 1.E: Personal Style Fine-tuning (LoRA) ### 9. `compi_phase1e_dataset_prep.py` - Dataset Preparation for LoRA Training **Features:** - Organize and validate personal style images for training - Generate appropriate training captions with trigger words - Resize and format images for optimal LoRA training - Create train/validation splits with metadata tracking - Support for multiple image formats and quality validation **Usage:** ```bash python src/generators/compi_phase1e_dataset_prep.py --input-dir my_artwork --style-name "my_art_style" # Or via wrapper: python run_dataset_prep.py --input-dir my_artwork --style-name "my_art_style" ``` ### 10. `compi_phase1e_lora_training.py` - LoRA Fine-tuning Engine **Features:** - Full LoRA (Low-Rank Adaptation) fine-tuning pipeline - Memory-efficient training with gradient checkpointing - Configurable LoRA parameters (rank, alpha, learning rate) - Automatic checkpoint saving and validation monitoring - Integration with PEFT library for optimal performance **Command Line Options:** ```bash python run_lora_training.py [OPTIONS] --dataset-dir DATASET_DIR Options: --dataset-dir DIR Required: Prepared dataset directory --epochs INT Number of training epochs (default: 100) --learning-rate FLOAT Learning rate (default: 1e-4) --lora-rank INT LoRA rank (default: 4) --lora-alpha INT LoRA alpha (default: 32) --batch-size INT Training batch size (default: 1) --save-steps INT Save checkpoint every N steps --gradient-checkpointing Enable gradient checkpointing for memory efficiency --mixed-precision Use mixed precision training ``` ### 11. `compi_phase1e_style_generation.py` - Personal Style Generation **Features:** - Generate images using trained LoRA personal styles - Adjustable style strength and generation parameters - Interactive and batch generation modes - Integration with existing CompI pipeline and metadata - Support for multiple LoRA styles and model switching **Usage:** ```bash python run_style_generation.py --lora-path lora_models/my_style/checkpoint-1000 "a cat in my_style" # Or directly: python src/generators/compi_phase1e_style_generation.py --lora-path PATH PROMPT ``` ### 12. `compi_phase1e_style_manager.py` - LoRA Style Management **Features:** - Manage multiple trained LoRA styles and checkpoints - Cleanup old checkpoints and organize model storage - Export style information and training analytics - Style database with automatic scanning and metadata - Batch operations for style maintenance and organization **Command Line Options:** ```bash python src/generators/compi_phase1e_style_manager.py [OPTIONS] Options: --list List all available LoRA styles --info STYLE_NAME Show detailed information about a style --refresh Refresh the styles database --cleanup STYLE_NAME Clean up old checkpoints for a style --export OUTPUT_FILE Export styles information to CSV --delete STYLE_NAME Delete a LoRA style (requires --confirm) ``` ### Web UI Examples **Streamlit Interface:** - Navigate to http://localhost:8501 after running - Full-featured interface with sidebar settings - Progress bars and status updates - Expandable sections for details **Gradio Interface:** - Navigate to http://localhost:7860 after running - Gallery-style image display - Compact, mobile-friendly design - Real-time generation feedback ## 🎯 Next Steps Phase 1 establishes the foundation for CompI's text-to-image capabilities. Future phases will add: - Audio input processing - Emotion and style conditioning - Real-time data integration - Multimodal fusion - Advanced UI interfaces ## 📚 Resources - [Stable Diffusion Documentation](https://huggingface.co/docs/diffusers) - [Prompt Engineering Guide](https://prompthero.com/stable-diffusion-prompt-guide) - [CompI Development Plan](development.md)