File size: 13,839 Bytes
338d95d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
# CompI Phase 1: Text-to-Image Generation Usage Guide

This guide covers the Phase 1 implementation of CompI's text-to-image generation capabilities using Stable Diffusion.

## πŸš€ Quick Start

### Basic Usage

```bash
# Simple generation with interactive prompt
python run_basic_generation.py

# Generate from command line
python run_basic_generation.py "A magical forest, digital art, highly detailed"

# Or run directly from src/generators/
python src/generators/compi_phase1_text2image.py "A magical forest"
```

### Advanced Usage

```bash
# Advanced script with more options
python run_advanced_generation.py "cyberpunk city at sunset" --negative "blurry, low quality" --steps 50 --batch 3

# Interactive mode for experimentation
python run_advanced_generation.py --interactive

# Or run directly from src/generators/
python src/generators/compi_phase1_advanced.py --interactive
```

## πŸ“‹ Available Scripts

### 1. `compi_phase1_text2image.py` - Basic Implementation

**Features:**

- Simple, standalone text-to-image generation
- Automatic GPU/CPU detection
- Command line or interactive prompts
- Automatic output saving with descriptive filenames
- Comprehensive logging

**Usage:**

```bash
python compi_phase1_text2image.py [prompt]
```

### 2. `compi_phase1_advanced.py` - Enhanced Implementation

**Features:**

- Batch generation (multiple images)
- Negative prompts (what to avoid)
- Customizable parameters (steps, guidance, dimensions)
- Interactive mode for experimentation
- Metadata saving (JSON files with generation parameters)
- Multiple model support

**Command Line Options:**

```bash
python compi_phase1_advanced.py [OPTIONS] [PROMPT]

Options:
  --negative, -n TEXT     Negative prompt (what to avoid)
  --steps, -s INTEGER     Number of inference steps (default: 30)
  --guidance, -g FLOAT    Guidance scale (default: 7.5)
  --seed INTEGER          Random seed for reproducibility
  --batch, -b INTEGER     Number of images to generate
  --width, -w INTEGER     Image width (default: 512)
  --height INTEGER        Image height (default: 512)
  --model, -m TEXT        Model to use (default: runwayml/stable-diffusion-v1-5)
  --output, -o TEXT       Output directory (default: outputs)
  --interactive, -i       Interactive mode
```

## 🎨 Example Commands

### Basic Examples

```bash
# Simple landscape
python run_basic_generation.py "serene mountain lake, golden hour, photorealistic"

# Digital art style
python run_basic_generation.py "futuristic robot, neon lights, cyberpunk style, digital art"
```

### Advanced Examples

```bash
# High-quality generation with negative prompts
python run_advanced_generation.py "beautiful portrait of a woman, oil painting style" \
  --negative "blurry, distorted, low quality, bad anatomy" \
  --steps 50 --guidance 8.0

# Batch generation with fixed seed
python run_advanced_generation.py "abstract geometric patterns, colorful" \
  --batch 5 --seed 12345 --steps 40

# Custom dimensions for landscape
python run_advanced_generation.py "panoramic view of alien landscape" \
  --width 768 --height 512 --steps 35

# Interactive experimentation
python run_advanced_generation.py --interactive
```

## πŸ“ Output Structure

Generated images are saved in the `outputs/` directory with descriptive filenames:

```
outputs/
β”œβ”€β”€ magical_forest_digital_art_20241225_143022_seed42.png
β”œβ”€β”€ magical_forest_digital_art_20241225_143022_seed42_metadata.json
β”œβ”€β”€ cyberpunk_city_sunset_20241225_143156_seed1337.png
└── cyberpunk_city_sunset_20241225_143156_seed1337_metadata.json
```

### Metadata Files

Each generated image (in advanced mode) includes a JSON metadata file with:

- Original prompt and negative prompt
- Generation parameters (steps, guidance, seed)
- Image dimensions and model used
- Timestamp and batch information

## βš™οΈ Configuration Tips

### For Best Quality

- Use 30-50 inference steps
- Guidance scale 7.5-12.0
- Include style descriptors ("digital art", "oil painting", "photorealistic")
- Use negative prompts to avoid unwanted elements

### For Speed

- Use 20-25 inference steps
- Lower guidance scale (6.0-7.5)
- Stick to 512x512 resolution

### For Experimentation

- Use interactive mode
- Try different seeds with the same prompt
- Experiment with guidance scale values
- Use batch generation to explore variations

## πŸ”§ Troubleshooting

### Common Issues

1. **CUDA out of memory**: Reduce batch size or image dimensions
2. **Slow generation**: Ensure CUDA is available and working
3. **Poor quality**: Increase steps, adjust guidance scale, improve prompts
4. **Model download fails**: Check internet connection, try again

### Performance Optimization

- The scripts automatically enable attention slicing for memory efficiency
- GPU detection is automatic
- Models are cached after first download

## 🎨 Phase 1.B: Style Conditioning & Prompt Engineering

### 3. `compi_phase1b_styled_generation.py` - Style Conditioning

**Features:**

- Interactive style and mood selection from curated lists
- Intelligent prompt engineering and combination
- Multiple variations with unique seeds
- Comprehensive logging and filename organization

**Usage:**

```bash
python run_styled_generation.py [prompt]
# Or directly: python src/generators/compi_phase1b_styled_generation.py [prompt]
```

### 4. `compi_phase1b_advanced_styling.py` - Advanced Style Control

**Features:**

- 13 predefined art styles with optimized prompts and negative prompts
- 9 mood categories with atmospheric conditioning
- Quality presets (draft/standard/high)
- Command line and interactive modes
- Comprehensive metadata saving

**Command Line Options:**

```bash
python run_advanced_styling.py [OPTIONS] [PROMPT]
# Or directly: python src/generators/compi_phase1b_advanced_styling.py [OPTIONS] [PROMPT]

Options:
  --style, -s TEXT        Art style (or number from list)
  --mood, -m TEXT         Mood/atmosphere (or number from list)
  --variations, -v INT    Number of variations (default: 1)
  --quality, -q CHOICE    Quality preset [draft/standard/high]
  --negative, -n TEXT     Negative prompt
  --interactive, -i       Interactive mode
  --list-styles          List available styles and exit
  --list-moods           List available moods and exit
```

### Style Conditioning Examples

**Basic Style Selection:**

```bash
# Interactive mode with guided selection
python run_styled_generation.py

# Command line with style selection
python run_advanced_styling.py "mountain landscape" --style cyberpunk --mood dramatic
```

**Advanced Style Control:**

```bash
# High quality with multiple variations
python run_advanced_styling.py "portrait of a wizard" \
  --style "oil painting" --mood "mysterious" \
  --quality high --variations 3 \
  --negative "blurry, distorted, amateur"

# List available options
python run_advanced_styling.py --list-styles
python run_advanced_styling.py --list-moods
```

**Available Styles:**

- digital art, oil painting, watercolor, cyberpunk
- impressionist, concept art, anime, photorealistic
- minimalist, surrealism, pixel art, steampunk, 3d render

**Available Moods:**

- dreamy, dark, peaceful, vibrant, melancholic
- mysterious, whimsical, dramatic, retro

## πŸ–₯️ Phase 1.C: Interactive Web UI

### 5. `compi_phase1c_streamlit_ui.py` - Streamlit Web Interface

**Features:**

- Complete web-based interface for text-to-image generation
- Interactive style and mood selection with custom options
- Advanced settings (steps, guidance, dimensions, negative prompts)
- Real-time image generation and display
- Progress tracking and generation logs
- Automatic saving with comprehensive metadata

**Usage:**

```bash
python run_ui.py
# Or directly: streamlit run src/ui/compi_phase1c_streamlit_ui.py
```

### 6. `compi_phase1c_gradio_ui.py` - Gradio Web Interface

**Features:**

- Alternative web interface with Gradio framework
- Gallery view for multiple image variations
- Collapsible advanced settings
- Real-time generation logs
- Mobile-friendly responsive design

**Usage:**

```bash
python run_gradio_ui.py
# Or directly: python src/ui/compi_phase1c_gradio_ui.py
```

## πŸ“Š Phase 1.D: Quality Evaluation Tools

### 7. `compi_phase1d_evaluate_quality.py` - Comprehensive Evaluation Interface

**Features:**

- Systematic image quality assessment with 5-criteria scoring system
- Interactive Streamlit web interface for detailed evaluation
- Objective metrics calculation (perceptual hashes, dimensions, file size)
- Batch evaluation capabilities for efficient processing
- Comprehensive logging and CSV export for trend analysis
- Summary analytics with performance insights and recommendations

**Usage:**

```bash
python run_evaluation.py
# Or directly: streamlit run src/generators/compi_phase1d_evaluate_quality.py
```

### 8. `compi_phase1d_cli_evaluation.py` - Command-Line Evaluation Tools

**Features:**

- Batch evaluation and analysis from command line
- Statistical summaries and performance reports
- Filtering by style, mood, and evaluation status
- Automated scoring for large image sets
- Detailed report generation with recommendations

**Command Line Options:**

```bash
python src/generators/compi_phase1d_cli_evaluation.py [OPTIONS]

Options:
  --analyze                    Display evaluation summary and statistics
  --report                     Generate detailed evaluation report
  --batch-score P S M Q A      Batch score images (1-5 for each criteria)
  --list-all                   List all images with evaluation status
  --list-evaluated             List only evaluated images
  --list-unevaluated          List only unevaluated images
  --style TEXT                 Filter by style
  --mood TEXT                  Filter by mood
  --notes TEXT                 Notes for batch evaluation
  --output FILE                Output file for reports
```

## 🎨 Phase 1.E: Personal Style Fine-tuning (LoRA)

### 9. `compi_phase1e_dataset_prep.py` - Dataset Preparation for LoRA Training

**Features:**

- Organize and validate personal style images for training
- Generate appropriate training captions with trigger words
- Resize and format images for optimal LoRA training
- Create train/validation splits with metadata tracking
- Support for multiple image formats and quality validation

**Usage:**

```bash
python src/generators/compi_phase1e_dataset_prep.py --input-dir my_artwork --style-name "my_art_style"
# Or via wrapper: python run_dataset_prep.py --input-dir my_artwork --style-name "my_art_style"
```

### 10. `compi_phase1e_lora_training.py` - LoRA Fine-tuning Engine

**Features:**

- Full LoRA (Low-Rank Adaptation) fine-tuning pipeline
- Memory-efficient training with gradient checkpointing
- Configurable LoRA parameters (rank, alpha, learning rate)
- Automatic checkpoint saving and validation monitoring
- Integration with PEFT library for optimal performance

**Command Line Options:**

```bash
python run_lora_training.py [OPTIONS] --dataset-dir DATASET_DIR

Options:
  --dataset-dir DIR            Required: Prepared dataset directory
  --epochs INT                 Number of training epochs (default: 100)
  --learning-rate FLOAT        Learning rate (default: 1e-4)
  --lora-rank INT              LoRA rank (default: 4)
  --lora-alpha INT             LoRA alpha (default: 32)
  --batch-size INT             Training batch size (default: 1)
  --save-steps INT             Save checkpoint every N steps
  --gradient-checkpointing     Enable gradient checkpointing for memory efficiency
  --mixed-precision            Use mixed precision training
```

### 11. `compi_phase1e_style_generation.py` - Personal Style Generation

**Features:**

- Generate images using trained LoRA personal styles
- Adjustable style strength and generation parameters
- Interactive and batch generation modes
- Integration with existing CompI pipeline and metadata
- Support for multiple LoRA styles and model switching

**Usage:**

```bash
python run_style_generation.py --lora-path lora_models/my_style/checkpoint-1000 "a cat in my_style"
# Or directly: python src/generators/compi_phase1e_style_generation.py --lora-path PATH PROMPT
```

### 12. `compi_phase1e_style_manager.py` - LoRA Style Management

**Features:**

- Manage multiple trained LoRA styles and checkpoints
- Cleanup old checkpoints and organize model storage
- Export style information and training analytics
- Style database with automatic scanning and metadata
- Batch operations for style maintenance and organization

**Command Line Options:**

```bash
python src/generators/compi_phase1e_style_manager.py [OPTIONS]

Options:
  --list                       List all available LoRA styles
  --info STYLE_NAME           Show detailed information about a style
  --refresh                    Refresh the styles database
  --cleanup STYLE_NAME         Clean up old checkpoints for a style
  --export OUTPUT_FILE         Export styles information to CSV
  --delete STYLE_NAME          Delete a LoRA style (requires --confirm)
```

### Web UI Examples

**Streamlit Interface:**

- Navigate to http://localhost:8501 after running
- Full-featured interface with sidebar settings
- Progress bars and status updates
- Expandable sections for details

**Gradio Interface:**

- Navigate to http://localhost:7860 after running
- Gallery-style image display
- Compact, mobile-friendly design
- Real-time generation feedback

## 🎯 Next Steps

Phase 1 establishes the foundation for CompI's text-to-image capabilities. Future phases will add:

- Audio input processing
- Emotion and style conditioning
- Real-time data integration
- Multimodal fusion
- Advanced UI interfaces

## πŸ“š Resources

- [Stable Diffusion Documentation](https://huggingface.co/docs/diffusers)
- [Prompt Engineering Guide](https://prompthero.com/stable-diffusion-prompt-guide)
- [CompI Development Plan](development.md)