|
# βοΈ CompI Phase 3.E: Performance, Model Management & Reliability - Complete Guide |
|
|
|
## π― **What Phase 3.E Delivers** |
|
|
|
**Phase 3.E transforms CompI into a production-grade platform with professional performance management, intelligent reliability, and advanced model capabilities.** |
|
|
|
### **π€ Model Manager** |
|
- **Dynamic Model Switching**: Switch between SD 1.5 and SDXL based on requirements |
|
- **Auto-Availability Checking**: Intelligent detection of model compatibility and VRAM requirements |
|
- **Universal LoRA Support**: Load and scale LoRA weights across all models and generation modes |
|
- **Smart Recommendations**: Hardware-based model suggestions and optimization advice |
|
|
|
### **β‘ Performance Controls** |
|
- **xFormers Integration**: Memory-efficient attention with automatic fallback |
|
- **Advanced Memory Optimization**: Attention slicing, VAE slicing/tiling, CPU offloading |
|
- **Precision Control**: Automatic dtype selection (fp16/bf16/fp32) based on hardware |
|
- **Batch Optimization**: Memory-aware batch processing with intelligent sizing |
|
|
|
### **π VRAM Monitoring** |
|
- **Real-time Tracking**: Live GPU memory usage monitoring and alerts |
|
- **Usage Analytics**: Memory usage patterns and optimization suggestions |
|
- **Threshold Warnings**: Automatic alerts when approaching memory limits |
|
- **Cache Management**: Intelligent GPU cache clearing and memory cleanup |
|
|
|
### **π‘οΈ Reliability Engine** |
|
- **OOM-Safe Generation**: Automatic retry with progressive fallback strategies |
|
- **Intelligent Fallbacks**: Reduce size β reduce steps β CPU fallback progression |
|
- **Error Classification**: Smart error detection and appropriate response strategies |
|
- **Graceful Degradation**: Maintain functionality even under resource constraints |
|
|
|
### **π¦ Batch Processing** |
|
- **Seed-Controlled Batches**: Deterministic seed sequences for reproducible results |
|
- **Memory-Aware Batching**: Automatic batch size optimization based on available VRAM |
|
- **Progress Tracking**: Detailed progress monitoring with per-image status |
|
- **Failure Recovery**: Continue batch processing even if individual images fail |
|
|
|
### **π Upscaler Integration** |
|
- **Latent Upscaler**: Optional 2x upscaling using Stable Diffusion Latent Upscaler |
|
- **Graceful Degradation**: Clean fallback when upscaler unavailable |
|
- **Memory Management**: Intelligent memory allocation for upscaling operations |
|
- **Quality Enhancement**: Professional-grade image enhancement capabilities |
|
|
|
--- |
|
|
|
## π **Quick Start Guide** |
|
|
|
### **1. Launch Phase 3.E** |
|
```bash |
|
# Method 1: Using launcher script (recommended) |
|
python run_phase3e_performance_manager.py |
|
|
|
# Method 2: Direct Streamlit launch |
|
streamlit run src/ui/compi_phase3e_performance_manager.py --server.port 8505 |
|
``` |
|
|
|
### **2. System Requirements Check** |
|
The launcher automatically checks: |
|
- **GPU Setup**: CUDA availability and VRAM capacity |
|
- **Dependencies**: Required and optional packages |
|
- **Model Support**: SD 1.5 and SDXL availability |
|
- **Performance Features**: xFormers and upscaler support |
|
|
|
### **3. Access the Interface** |
|
- **URL:** `http://localhost:8505` |
|
- **Interface:** Professional Streamlit dashboard with real-time monitoring |
|
- **Sidebar:** Live VRAM monitoring and system status |
|
|
|
--- |
|
|
|
## π¨ **Professional Workflow** |
|
|
|
### **Step 1: Model Selection** |
|
1. **Choose Base Model**: SD 1.5 (fast, compatible) or SDXL (high quality, more VRAM) |
|
2. **Select Generation Mode**: txt2img or img2img |
|
3. **Check Compatibility**: System automatically validates model/mode combinations |
|
4. **Review VRAM Requirements**: See memory requirements and availability status |
|
|
|
### **Step 2: LoRA Integration (Optional)** |
|
1. **Enable LoRA**: Toggle LoRA support |
|
2. **Specify Path**: Enter path to LoRA weights (diffusers format) |
|
3. **Set Scale**: Adjust LoRA influence (0.1-2.0) |
|
4. **Verify Status**: Check LoRA loading status and compatibility |
|
|
|
### **Step 3: Performance Optimization** |
|
1. **Choose Optimization Level**: Conservative, Balanced, Aggressive, or Extreme |
|
2. **Monitor VRAM**: Watch real-time memory usage in sidebar |
|
3. **Adjust Settings**: Fine-tune individual optimization features |
|
4. **Enable Reliability**: Configure OOM retry and CPU fallback options |
|
|
|
### **Step 4: Generation** |
|
1. **Single Images**: Generate individual images with full control |
|
2. **Batch Processing**: Create multiple images with seed sequences |
|
3. **Monitor Progress**: Track generation progress and memory usage |
|
4. **Review Results**: Analyze generation statistics and performance metrics |
|
|
|
--- |
|
|
|
## π§ **Advanced Features** |
|
|
|
### **π€ Model Manager Deep Dive** |
|
|
|
#### **Model Compatibility Matrix** |
|
```python |
|
SD 1.5: |
|
β
txt2img (512x512 optimal) |
|
β
img2img (all strengths) |
|
β
ControlNet (full support) |
|
β
LoRA (universal compatibility) |
|
πΎ VRAM: 4+ GB recommended |
|
|
|
SDXL: |
|
β
txt2img (1024x1024 optimal) |
|
β
img2img (limited support) |
|
β οΈ ControlNet (requires special handling) |
|
β
LoRA (SDXL-compatible weights only) |
|
πΎ VRAM: 8+ GB recommended |
|
``` |
|
|
|
#### **Automatic Model Selection Logic** |
|
- **VRAM < 6GB**: Recommends SD 1.5 only |
|
- **VRAM 6-8GB**: SD 1.5 preferred, SDXL with warnings |
|
- **VRAM 8GB+**: Full SDXL support with optimizations |
|
- **CPU Mode**: SD 1.5 only with aggressive optimizations |
|
|
|
### **β‘ Performance Optimization Levels** |
|
|
|
#### **Conservative Mode** |
|
- Basic attention slicing |
|
- Standard precision (fp16/fp32) |
|
- Minimal memory optimizations |
|
- **Best for**: Stable systems, first-time users |
|
|
|
#### **Balanced Mode (Default)** |
|
- xFormers attention (if available) |
|
- Attention + VAE slicing |
|
- Automatic precision selection |
|
- **Best for**: Most users, good performance/stability balance |
|
|
|
#### **Aggressive Mode** |
|
- All memory optimizations enabled |
|
- VAE tiling for large images |
|
- Maximum memory efficiency |
|
- **Best for**: Limited VRAM, large batch processing |
|
|
|
#### **Extreme Mode** |
|
- CPU offloading enabled |
|
- Maximum memory savings |
|
- Slower but uses minimal VRAM |
|
- **Best for**: Very limited VRAM (<4GB) |
|
|
|
### **π‘οΈ Reliability Engine Strategies** |
|
|
|
#### **Fallback Progression** |
|
```python |
|
Strategy 1: Original settings (100% size, 100% steps) |
|
Strategy 2: Reduced size (75% size, 90% steps) |
|
Strategy 3: Half size (50% size, 80% steps) |
|
Strategy 4: Minimal (50% size, 60% steps) |
|
Final: CPU fallback if all GPU attempts fail |
|
``` |
|
|
|
#### **Error Classification** |
|
- **CUDA OOM**: Triggers progressive fallback |
|
- **Model Loading**: Suggests alternative models |
|
- **LoRA Errors**: Disables LoRA and retries |
|
- **General Errors**: Logs and reports with context |
|
|
|
### **π VRAM Monitoring System** |
|
|
|
#### **Real-time Metrics** |
|
- **Total VRAM**: Hardware capacity |
|
- **Used VRAM**: Currently allocated memory |
|
- **Free VRAM**: Available for new operations |
|
- **Usage Percentage**: Current utilization level |
|
|
|
#### **Smart Alerts** |
|
- **Green (0-60%)**: Optimal usage |
|
- **Yellow (60-80%)**: Moderate usage, monitor closely |
|
- **Red (80%+)**: High usage, optimization recommended |
|
|
|
#### **Memory Management** |
|
- **Automatic Cache Clearing**: Between batch generations |
|
- **Memory Leak Detection**: Identifies and resolves memory issues |
|
- **Optimization Suggestions**: Hardware-specific recommendations |
|
|
|
--- |
|
|
|
## π **Performance Benchmarks** |
|
|
|
### **Generation Speed Comparison** |
|
``` |
|
SD 1.5 (512x512, 20 steps): |
|
RTX 4090: ~15-25 seconds |
|
RTX 3080: ~25-35 seconds |
|
RTX 2080: ~45-60 seconds |
|
CPU: ~5-10 minutes |
|
|
|
SDXL (1024x1024, 20 steps): |
|
RTX 4090: ~30-45 seconds |
|
RTX 3080: ~60-90 seconds |
|
RTX 2080: ~2-3 minutes (with optimizations) |
|
CPU: ~15-30 minutes |
|
``` |
|
|
|
### **Memory Usage Patterns** |
|
``` |
|
SD 1.5: |
|
Base: ~3.5GB VRAM |
|
+ LoRA: ~3.7GB VRAM |
|
+ Upscaler: ~5.5GB VRAM |
|
|
|
SDXL: |
|
Base: ~6.5GB VRAM |
|
+ LoRA: ~7.0GB VRAM |
|
+ Upscaler: ~9.0GB VRAM |
|
``` |
|
|
|
--- |
|
|
|
## π **Troubleshooting Guide** |
|
|
|
### **Common Issues & Solutions** |
|
|
|
#### **"CUDA Out of Memory" Errors** |
|
1. **Enable OOM Auto-Retry**: Automatic fallback handling |
|
2. **Reduce Image Size**: Use 512x512 instead of 1024x1024 |
|
3. **Lower Batch Size**: Generate fewer images simultaneously |
|
4. **Enable Aggressive Optimizations**: Use VAE slicing/tiling |
|
5. **Clear GPU Cache**: Use sidebar "Clear GPU Cache" button |
|
|
|
#### **Slow Generation Speed** |
|
1. **Enable xFormers**: Significant speed improvement if available |
|
2. **Use Balanced Optimization**: Good speed/quality trade-off |
|
3. **Reduce Inference Steps**: 15-20 steps often sufficient |
|
4. **Check VRAM Usage**: Ensure not hitting memory limits |
|
|
|
#### **Model Loading Failures** |
|
1. **Check Internet Connection**: Models download on first use |
|
2. **Verify Disk Space**: Models require 2-7GB storage each |
|
3. **Try Alternative Model**: Switch between SD 1.5 and SDXL |
|
4. **Clear Model Cache**: Remove cached models and re-download |
|
|
|
#### **LoRA Loading Issues** |
|
1. **Verify Path**: Ensure LoRA files exist at specified path |
|
2. **Check Format**: Use diffusers-compatible LoRA weights |
|
3. **Model Compatibility**: Ensure LoRA matches base model type |
|
4. **Scale Adjustment**: Try different LoRA scale values |
|
|
|
--- |
|
|
|
## π― **Best Practices** |
|
|
|
### **π Performance Optimization** |
|
1. **Start Conservative**: Begin with balanced settings, adjust as needed |
|
2. **Monitor VRAM**: Keep usage below 80% for stability |
|
3. **Batch Wisely**: Use smaller batches on limited hardware |
|
4. **Clear Cache Regularly**: Prevent memory accumulation |
|
|
|
### **π€ Model Selection** |
|
1. **SD 1.5 for Speed**: Faster generation, lower VRAM requirements |
|
2. **SDXL for Quality**: Higher resolution, better detail |
|
3. **Match Hardware**: Choose model based on available VRAM |
|
4. **Test Compatibility**: Verify model works with your use case |
|
|
|
### **π‘οΈ Reliability** |
|
1. **Enable Auto-Retry**: Let system handle OOM errors automatically |
|
2. **Use Fallbacks**: Allow progressive degradation for reliability |
|
3. **Monitor Logs**: Check run logs for patterns and issues |
|
4. **Plan for Failures**: Design workflows that handle generation failures |
|
|
|
--- |
|
|
|
## π **Integration with CompI Ecosystem** |
|
|
|
### **Universal Enhancement** |
|
Phase 3.E enhances ALL existing CompI components: |
|
- **Ultimate Dashboard**: Model switching and performance controls |
|
- **Phase 2.A-2.E**: Reliability and optimization for all multimodal phases |
|
- **Phase 1.A-1.E**: Enhanced foundation with professional features |
|
- **Phase 3.D**: Performance metrics in workflow management |
|
|
|
### **Backward Compatibility** |
|
- **Graceful Degradation**: Works on all hardware configurations |
|
- **Default Settings**: Optimal defaults for most users |
|
- **Progressive Enhancement**: Advanced features when available |
|
- **Legacy Support**: Maintains compatibility with existing workflows |
|
|
|
--- |
|
|
|
## π **Phase 3.E: Production-Grade CompI Complete** |
|
|
|
**Phase 3.E transforms CompI into a production-grade platform with professional performance management, intelligent reliability, and advanced model capabilities.** |
|
|
|
**Key Benefits:** |
|
- β
**Professional Performance**: Industry-standard optimization and monitoring |
|
- β
**Intelligent Reliability**: Automatic error handling and recovery |
|
- β
**Advanced Model Management**: Dynamic switching and LoRA integration |
|
- β
**Production Ready**: Suitable for commercial and professional use |
|
- β
**Universal Enhancement**: Improves all existing CompI features |
|
|
|
**CompI is now a complete, production-grade multimodal AI art generation platform!** π¨β¨ |
|
|