Spaces:

axrzce
/

Comp-I

Running

App Files Files Community

Comp-I / docs /PHASE3E_PERFORMANCE_GUIDE.md

axrzce

Deploy from GitHub main

338d95d verified 26 days ago

preview code

raw

history blame

11.2 kB

⚙️ CompI Phase 3.E: Performance, Model Management & Reliability - Complete Guide

🎯 What Phase 3.E Delivers

Phase 3.E transforms CompI into a production-grade platform with professional performance management, intelligent reliability, and advanced model capabilities.

🤖 Model Manager

Dynamic Model Switching: Switch between SD 1.5 and SDXL based on requirements
Auto-Availability Checking: Intelligent detection of model compatibility and VRAM requirements
Universal LoRA Support: Load and scale LoRA weights across all models and generation modes
Smart Recommendations: Hardware-based model suggestions and optimization advice

⚡ Performance Controls

xFormers Integration: Memory-efficient attention with automatic fallback
Advanced Memory Optimization: Attention slicing, VAE slicing/tiling, CPU offloading
Precision Control: Automatic dtype selection (fp16/bf16/fp32) based on hardware
Batch Optimization: Memory-aware batch processing with intelligent sizing

📊 VRAM Monitoring

Real-time Tracking: Live GPU memory usage monitoring and alerts
Usage Analytics: Memory usage patterns and optimization suggestions
Threshold Warnings: Automatic alerts when approaching memory limits
Cache Management: Intelligent GPU cache clearing and memory cleanup

🛡️ Reliability Engine

OOM-Safe Generation: Automatic retry with progressive fallback strategies
Intelligent Fallbacks: Reduce size → reduce steps → CPU fallback progression
Error Classification: Smart error detection and appropriate response strategies
Graceful Degradation: Maintain functionality even under resource constraints

📦 Batch Processing

Seed-Controlled Batches: Deterministic seed sequences for reproducible results
Memory-Aware Batching: Automatic batch size optimization based on available VRAM
Progress Tracking: Detailed progress monitoring with per-image status
Failure Recovery: Continue batch processing even if individual images fail

🔍 Upscaler Integration

Latent Upscaler: Optional 2x upscaling using Stable Diffusion Latent Upscaler
Graceful Degradation: Clean fallback when upscaler unavailable
Memory Management: Intelligent memory allocation for upscaling operations
Quality Enhancement: Professional-grade image enhancement capabilities

🚀 Quick Start Guide

1. Launch Phase 3.E

# Method 1: Using launcher script (recommended)
python run_phase3e_performance_manager.py

# Method 2: Direct Streamlit launch
streamlit run src/ui/compi_phase3e_performance_manager.py --server.port 8505

2. System Requirements Check

The launcher automatically checks:

GPU Setup: CUDA availability and VRAM capacity
Dependencies: Required and optional packages
Model Support: SD 1.5 and SDXL availability
Performance Features: xFormers and upscaler support

3. Access the Interface

URL: http://localhost:8505
Interface: Professional Streamlit dashboard with real-time monitoring
Sidebar: Live VRAM monitoring and system status

🎨 Professional Workflow

Step 1: Model Selection

Choose Base Model: SD 1.5 (fast, compatible) or SDXL (high quality, more VRAM)
Select Generation Mode: txt2img or img2img
Check Compatibility: System automatically validates model/mode combinations
Review VRAM Requirements: See memory requirements and availability status

Step 2: LoRA Integration (Optional)

Enable LoRA: Toggle LoRA support
Specify Path: Enter path to LoRA weights (diffusers format)
Set Scale: Adjust LoRA influence (0.1-2.0)
Verify Status: Check LoRA loading status and compatibility

Step 3: Performance Optimization

Choose Optimization Level: Conservative, Balanced, Aggressive, or Extreme
Monitor VRAM: Watch real-time memory usage in sidebar
Adjust Settings: Fine-tune individual optimization features
Enable Reliability: Configure OOM retry and CPU fallback options

Step 4: Generation

Single Images: Generate individual images with full control
Batch Processing: Create multiple images with seed sequences
Monitor Progress: Track generation progress and memory usage
Review Results: Analyze generation statistics and performance metrics

🔧 Advanced Features

🤖 Model Manager Deep Dive

Model Compatibility Matrix

SD 1.5:
  ✅ txt2img (512x512 optimal)
  ✅ img2img (all strengths)
  ✅ ControlNet (full support)
  ✅ LoRA (universal compatibility)
  💾 VRAM: 4+ GB recommended

SDXL:
  ✅ txt2img (1024x1024 optimal)
  ✅ img2img (limited support)
  ⚠️ ControlNet (requires special handling)
  ✅ LoRA (SDXL-compatible weights only)
  💾 VRAM: 8+ GB recommended

Automatic Model Selection Logic

VRAM < 6GB: Recommends SD 1.5 only
VRAM 6-8GB: SD 1.5 preferred, SDXL with warnings
VRAM 8GB+: Full SDXL support with optimizations
CPU Mode: SD 1.5 only with aggressive optimizations

⚡ Performance Optimization Levels

Conservative Mode

Basic attention slicing
Standard precision (fp16/fp32)
Minimal memory optimizations
Best for: Stable systems, first-time users

Balanced Mode (Default)

xFormers attention (if available)
Attention + VAE slicing
Automatic precision selection
Best for: Most users, good performance/stability balance

Aggressive Mode

All memory optimizations enabled
VAE tiling for large images
Maximum memory efficiency
Best for: Limited VRAM, large batch processing

Extreme Mode

CPU offloading enabled
Maximum memory savings
Slower but uses minimal VRAM
Best for: Very limited VRAM (<4GB)

🛡️ Reliability Engine Strategies

Fallback Progression

Strategy 1: Original settings (100% size, 100% steps)
Strategy 2: Reduced size (75% size, 90% steps)  
Strategy 3: Half size (50% size, 80% steps)
Strategy 4: Minimal (50% size, 60% steps)
Final: CPU fallback if all GPU attempts fail

Error Classification

CUDA OOM: Triggers progressive fallback
Model Loading: Suggests alternative models
LoRA Errors: Disables LoRA and retries
General Errors: Logs and reports with context

📊 VRAM Monitoring System

Real-time Metrics

Total VRAM: Hardware capacity
Used VRAM: Currently allocated memory
Free VRAM: Available for new operations
Usage Percentage: Current utilization level

Smart Alerts

Green (0-60%): Optimal usage
Yellow (60-80%): Moderate usage, monitor closely
Red (80%+): High usage, optimization recommended

Memory Management

Automatic Cache Clearing: Between batch generations
Memory Leak Detection: Identifies and resolves memory issues
Optimization Suggestions: Hardware-specific recommendations

📈 Performance Benchmarks

Generation Speed Comparison

SD 1.5 (512x512, 20 steps):
  RTX 4090: ~15-25 seconds
  RTX 3080: ~25-35 seconds  
  RTX 2080: ~45-60 seconds
  CPU: ~5-10 minutes

SDXL (1024x1024, 20 steps):
  RTX 4090: ~30-45 seconds
  RTX 3080: ~60-90 seconds
  RTX 2080: ~2-3 minutes (with optimizations)
  CPU: ~15-30 minutes

Memory Usage Patterns

SD 1.5:
  Base: ~3.5GB VRAM
  + LoRA: ~3.7GB VRAM
  + Upscaler: ~5.5GB VRAM

SDXL:
  Base: ~6.5GB VRAM
  + LoRA: ~7.0GB VRAM  
  + Upscaler: ~9.0GB VRAM

🔍 Troubleshooting Guide

Common Issues & Solutions

"CUDA Out of Memory" Errors

Enable OOM Auto-Retry: Automatic fallback handling
Reduce Image Size: Use 512x512 instead of 1024x1024
Lower Batch Size: Generate fewer images simultaneously
Enable Aggressive Optimizations: Use VAE slicing/tiling
Clear GPU Cache: Use sidebar "Clear GPU Cache" button

Slow Generation Speed

Enable xFormers: Significant speed improvement if available
Use Balanced Optimization: Good speed/quality trade-off
Reduce Inference Steps: 15-20 steps often sufficient
Check VRAM Usage: Ensure not hitting memory limits

Model Loading Failures

Check Internet Connection: Models download on first use
Verify Disk Space: Models require 2-7GB storage each
Try Alternative Model: Switch between SD 1.5 and SDXL
Clear Model Cache: Remove cached models and re-download

LoRA Loading Issues

Verify Path: Ensure LoRA files exist at specified path
Check Format: Use diffusers-compatible LoRA weights
Model Compatibility: Ensure LoRA matches base model type
Scale Adjustment: Try different LoRA scale values

🎯 Best Practices

📝 Performance Optimization

Start Conservative: Begin with balanced settings, adjust as needed
Monitor VRAM: Keep usage below 80% for stability
Batch Wisely: Use smaller batches on limited hardware
Clear Cache Regularly: Prevent memory accumulation

🤖 Model Selection

SD 1.5 for Speed: Faster generation, lower VRAM requirements
SDXL for Quality: Higher resolution, better detail
Match Hardware: Choose model based on available VRAM
Test Compatibility: Verify model works with your use case

🛡️ Reliability

Enable Auto-Retry: Let system handle OOM errors automatically
Use Fallbacks: Allow progressive degradation for reliability
Monitor Logs: Check run logs for patterns and issues
Plan for Failures: Design workflows that handle generation failures

🚀 Integration with CompI Ecosystem

Universal Enhancement

Phase 3.E enhances ALL existing CompI components:

Ultimate Dashboard: Model switching and performance controls
Phase 2.A-2.E: Reliability and optimization for all multimodal phases
Phase 1.A-1.E: Enhanced foundation with professional features
Phase 3.D: Performance metrics in workflow management

Backward Compatibility

Graceful Degradation: Works on all hardware configurations
Default Settings: Optimal defaults for most users
Progressive Enhancement: Advanced features when available
Legacy Support: Maintains compatibility with existing workflows

🎉 Phase 3.E: Production-Grade CompI Complete

Phase 3.E transforms CompI into a production-grade platform with professional performance management, intelligent reliability, and advanced model capabilities.

Key Benefits:

✅ Professional Performance: Industry-standard optimization and monitoring
✅ Intelligent Reliability: Automatic error handling and recovery
✅ Advanced Model Management: Dynamic switching and LoRA integration
✅ Production Ready: Suitable for commercial and professional use
✅ Universal Enhancement: Improves all existing CompI features

CompI is now a complete, production-grade multimodal AI art generation platform! 🎨✨