Comp-I / README.md
axrzce's picture
Deploy from GitHub main
8fd6cc4 verified
|
raw
history blame
13.8 kB
metadata
title: CompI β€” Final Dashboard
emoji: 🎨
colorFrom: indigo
colorTo: purple
sdk: streamlit
app_file: src/ui/compi_phase3_final_dashboard.py
pinned: false

CompI - Compositional Intelligence Project

A multi-modal AI system that generates creative content by combining text, images, audio, and emotional context.

Note: All documentation has been consolidated under docs/. See docs/README.md for an index of guides.

πŸš€ Project Overview

CompI (Compositional Intelligence) is designed to create rich, contextually-aware content by:

  • Processing text prompts with emotional analysis
  • Generating images using Stable Diffusion
  • Creating audio compositions
  • Combining multiple modalities for enhanced creative output

πŸ“ Project Structure

Project CompI/
β”œβ”€β”€ src/                    # Source code
β”‚   β”œβ”€β”€ generators/        # Image generation modules
β”‚   β”œβ”€β”€ models/            # Model implementations
β”‚   β”œβ”€β”€ utils/             # Utility functions
β”‚   β”œβ”€β”€ data/              # Data processing
β”‚   β”œβ”€β”€ ui/                # User interface components
β”‚   └── setup_env.py       # Environment setup script
β”œβ”€β”€ notebooks/             # Jupyter notebooks for experimentation
β”œβ”€β”€ data/                  # Dataset storage
β”œβ”€β”€ outputs/               # Generated content
β”œβ”€β”€ tests/                 # Unit tests
β”œβ”€β”€ run_*.py               # Convenience scripts for generators
β”œβ”€β”€ requirements.txt       # Python dependencies
└── README.md             # This file

πŸ› οΈ Setup Instructions

1. Create Virtual Environment

# Using conda (recommended for ML projects)
conda create -n compi-env python=3.10 -y
conda activate compi-env

# OR using venv
python -m venv compi-env
# Windows
compi-env\Scripts\activate
# Linux/Mac
source compi-env/bin/activate

2. Install Dependencies

For GPU users (recommended for faster generation):

# First, check your CUDA version
nvidia-smi

# Install PyTorch with CUDA support first (replace cu121 with your CUDA version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Then install remaining requirements
pip install -r requirements.txt

For CPU-only users:

pip install -r requirements.txt

3. Test Installation

python src/test_setup.py

πŸš€ Quick Start

Phase 1: Text-to-Image Generation

# Basic text-to-image generation
python run_basic_generation.py "A magical forest, digital art"

# Advanced generation with style conditioning
python run_advanced_styling.py "dragon in a crystal cave" --style "oil painting" --mood "dramatic"

# Interactive style selection
python run_styled_generation.py

# Quality evaluation and analysis
python run_evaluation.py

# Personal style training with LoRA
python run_lora_training.py --dataset-dir datasets/my_style

# Generate with personal style
python run_style_generation.py --lora-path lora_models/my_style/checkpoint-1000 "artwork in my_style"

Phase 2.A: Audio-to-Image Generation 🎡

# Install audio processing dependencies
pip install openai-whisper

# Streamlit UI (Recommended)
streamlit run src/ui/compi_phase2a_streamlit_ui.py

# Command line generation
python run_phase2a_audio_to_image.py --prompt "mystical forest" --audio "music.mp3"

# Interactive mode
python run_phase2a_audio_to_image.py --interactive

# Test installation
python src/test_phase2a.py

# Run examples
python examples/phase2a_audio_examples.py --example all

Phase 2.B: Data/Logic-to-Image Generation πŸ“Š

# Streamlit UI (Recommended)
streamlit run src/ui/compi_phase2b_streamlit_ui.py

# Command line generation with CSV data
python run_phase2b_data_to_image.py --prompt "data visualization" --csv "data.csv"

# Mathematical formula generation
python run_phase2b_data_to_image.py --prompt "mathematical harmony" --formula "np.sin(np.linspace(0, 4*np.pi, 100))"

# Batch processing
python run_phase2b_data_to_image.py --batch-csv "data_folder/" --prompt "scientific patterns"

# Interactive mode
python run_phase2b_data_to_image.py --interactive

Phase 2.C: Emotional/Contextual Input to Image Generation πŸŒ€

# Streamlit UI (Recommended)
streamlit run src/ui/compi_phase2c_streamlit_ui.py

# Command line generation with preset emotion
python run_phase2c_emotion_to_image.py --prompt "mystical forest" --emotion "mysterious"

# Custom emotion generation
python run_phase2c_emotion_to_image.py --prompt "urban landscape" --emotion "🀩" --type custom

# Descriptive emotion generation
python run_phase2c_emotion_to_image.py --prompt "mountain vista" --emotion "I feel a sense of wonder" --type text

# Batch emotion processing
python run_phase2c_emotion_to_image.py --batch-emotions "joyful,sad,mysterious" --prompt "abstract art"

# Interactive mode
python run_phase2c_emotion_to_image.py --interactive

Phase 2.D: Real-Time Data Feeds to Image Generation 🌎

# Streamlit UI (Recommended)
streamlit run src/ui/compi_phase2d_streamlit_ui.py

# Command line generation with weather data
python run_phase2d_realtime_to_image.py --prompt "cityscape" --weather --city "Tokyo"

# News-driven generation
python run_phase2d_realtime_to_image.py --prompt "abstract art" --news --category "technology"

# Multi-source generation
python run_phase2d_realtime_to_image.py --prompt "world state" --weather --news --financial

# Temporal series generation
python run_phase2d_realtime_to_image.py --prompt "evolving world" --weather --temporal "0,30,60"

# Interactive mode
python run_phase2d_realtime_to_image.py --interactive

Phase 2.E: Style Reference/Example Image to AI Art πŸ–ΌοΈ

# Streamlit UI (Recommended)
streamlit run src/ui/compi_phase2e_streamlit_ui.py

# Command line generation with reference image
python run_phase2e_refimg_to_image.py --prompt "magical forest" --reference "path/to/image.jpg" --strength 0.6

# Web URL reference
python run_phase2e_refimg_to_image.py --prompt "cyberpunk city" --reference "https://example.com/artwork.jpg"

# Batch generation with multiple variations
python run_phase2e_refimg_to_image.py --prompt "fantasy landscape" --reference "image.png" --num-images 3

# Style analysis only
python run_phase2e_refimg_to_image.py --analyze-only --reference "artwork.jpg"

# Interactive mode
python run_phase2e_refimg_to_image.py --interactive

πŸ§ͺ NEW: Ultimate Multimodal Dashboard (True Fusion) πŸš€

Revolutionary upgrade with REAL processing of each input type!

# Launch the upgraded dashboard with true multimodal fusion
python run_ultimate_multimodal_dashboard.py

# Or run directly
streamlit run src/ui/compi_ultimate_multimodal_dashboard.py --server.port 8503

Key Improvements:

  • βœ… Real Audio Analysis: Whisper transcription + librosa features
  • βœ… Actual Data Processing: CSV analysis + formula evaluation
  • βœ… True Emotion Analysis: TextBlob sentiment classification
  • βœ… Live Real-time Data: Weather/news API integration
  • βœ… Advanced References: img2img + ControlNet processing
  • βœ… Intelligent Fusion: Actual content processing (not static keywords)

Access at: http://localhost:8503

See: ULTIMATE_MULTIMODAL_DASHBOARD_README.md for detailed documentation.

πŸ–ΌοΈ NEW: Phase 3.C Advanced Reference Integration πŸš€

Professional multi-reference control with hybrid generation modes!

Key Features:

  • βœ… Role-Based Reference Assignment: Select images for style vs structure
  • βœ… Live ControlNet Previews: Real-time Canny/Depth preprocessing
  • βœ… Hybrid Generation Modes: CN + IMG2IMG simultaneous processing
  • βœ… Professional Controls: Independent strength tuning for style/structure
  • βœ… Seamless Integration: Works with all CompI multimodal phases

See: PHASE3C_ADVANCED_REFERENCE_INTEGRATION.md for complete documentation.

πŸ—‚οΈ NEW: Phase 3.D Professional Workflow Manager πŸš€

Complete creative workflow platform with unified logging, presets, and export bundles!

Key Features:

  • βœ… Unified Run Logging: Auto-ingests from all CompI phases
  • βœ… Professional Gallery: Advanced filtering and search
  • βœ… Preset System: Save/load complete generation configs
  • βœ… Export Bundles: ZIP packages with metadata and reproducibility
  • βœ… Annotation System: Ratings, tags, and notes for workflow management

Launch: python run_phase3d_workflow_manager.py | Access: http://localhost:8504

See: docs/PHASE3D_WORKFLOW_MANAGER_GUIDE.md for complete documentation.

βš™οΈ NEW: Phase 3.E Performance, Model Management & Reliability πŸš€

Production-grade performance optimization, model switching, and intelligent reliability!

Key Features:

  • βœ… Model Manager: Dynamic SD 1.5 ↔ SDXL switching with auto-availability checking
  • βœ… LoRA Integration: Universal LoRA loading with scale control across all models
  • βœ… Performance Controls: xFormers, attention slicing, VAE optimizations, precision control
  • βœ… VRAM Monitoring: Real-time GPU memory usage tracking and alerts
  • βœ… Reliability Engine: OOM-safe auto-retry with intelligent fallbacks
  • βœ… Batch Processing: Seed-controlled batch generation with memory management
  • βœ… Upscaler Integration: Optional 2x latent upscaling for enhanced quality

Launch: python run_phase3e_performance_manager.py | Access: http://localhost:8505

See: docs/PHASE3E_PERFORMANCE_GUIDE.md for complete documentation.

πŸ§ͺ ULTIMATE: Phase 3 Final Dashboard - Complete Integration! πŸŽ‰

The ultimate CompI interface that integrates ALL Phase 3 components into one unified creative environment!

Complete Feature Integration:

  • βœ… 🧩 Multimodal Fusion (3.A/3.B): Real audio, data, emotion, real-time processing
  • βœ… πŸ–ΌοΈ Advanced References (3.C): Role assignment, ControlNet, live previews
  • βœ… βš™οΈ Performance Management (3.E): Model switching, LoRA, VRAM monitoring
  • βœ… πŸŽ›οΈ Intelligent Generation: Hybrid modes with automatic fallback strategies
  • βœ… πŸ–ΌοΈ Professional Gallery (3.D): Filtering, rating, annotation system
  • βœ… πŸ’Ύ Preset Management (3.D): Save/load complete configurations
  • βœ… πŸ“¦ Export System (3.D): Complete bundles with metadata and reproducibility

Professional Workflow:

  1. Configure multimodal inputs (text, audio, data, emotion, real-time)
  2. Upload and assign references (style vs structure roles)
  3. Choose model and optimize performance (SD 1.5/SDXL, LoRA, optimizations)
  4. Generate with intelligent fusion (automatic mode selection)
  5. Review and annotate results (gallery with rating/tagging)
  6. Save presets and export bundles (complete reproducibility)

Launch: python run_phase3_final_dashboard.py | Access: http://localhost:8506

See: docs/PHASE3_FINAL_DASHBOARD_GUIDE.md for complete documentation.


🎯 CompI Project Status: COMPLETE βœ…

CompI has achieved its ultimate vision: the world's most comprehensive and production-ready multimodal AI art generation platform!

βœ… All Phases Complete:

  • βœ… Phase 1: Foundation (text-to-image, styling, evaluation, LoRA training)
  • βœ… Phase 2: Multimodal integration (audio, data, emotion, real-time, references)
  • βœ… Phase 3: Advanced features (fusion dashboard, advanced references, workflow management, performance optimization)

πŸš€ What CompI Offers:

  • Complete Creative Platform: From generation to professional workflow management
  • Production-Grade Reliability: Robust error handling and performance optimization
  • Professional Tools: Industry-standard features for serious creative and commercial work
  • Universal Compatibility: Works across different hardware configurations
  • Extensible Foundation: Ready for future enhancements and integrations

CompI is now the ultimate multimodal AI art generation platform - ready for professional creative work! 🎨✨

🎯 Core Features

  • Text Analysis: Emotion detection and sentiment analysis
  • Image Generation: Stable Diffusion integration with advanced conditioning
  • Audio Processing: Music and sound analysis with Whisper integration
  • Data Processing: CSV analysis and mathematical formula evaluation
  • Emotion Processing: Preset emotions, custom emotions, emoji, and contextual analysis
  • Real-Time Integration: Live weather, news, and financial data feeds
  • Style Reference: Upload/URL image guidance with AI-powered style analysis
  • Multi-modal Fusion: Combining text, audio, data, emotions, real-time feeds, and visual references
  • Pattern Recognition: Automatic detection of trends, correlations, and seasonality
  • Poetic Interpretation: Converting data patterns and emotions into artistic language
  • Color Psychology: Emotion-based color palette generation and conditioning
  • Temporal Awareness: Time-sensitive data processing and evolution tracking

πŸ”§ Tech Stack

  • Deep Learning: PyTorch, Transformers, Diffusers
  • Audio: librosa, soundfile
  • UI: Streamlit/Gradio
  • Data: pandas, numpy
  • Visualization: matplotlib, seaborn

πŸ“ Usage

Coming soon - basic usage examples and API documentation.

🀝 Contributing

This is a development project. Feel free to experiment and extend functionality.

πŸ“„ License

MIT License - see LICENSE file for details.

Project_CompI