File size: 2,402 Bytes
cc4c423
121cc9f
cc4c423
 
 
 
 
 
121cc9f
 
cc4c423
121cc9f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
license: openrail++
language:
- en
library_name: diffusers
pipeline_tag: text-to-image
tags:
- text-to-image
base_model:
- stabilityai/stable-diffusion-xl-base-1.0
---

# Anime Stable Diffusion Model

A custom Stable Diffusion model fine-tuned for anime-style image generation, trained on a large dataset of anime images.
This is the first concept model for the entire series as I am spending more time filtering and processing the 
larger dataset. Currently the model is still undertrained, while it can reflect certain notions, a lot of additional 
improvements need to be done. 

## Prompt 
Danbooru style tagging. 

Quality tag: Masterpiece, high quality, normal quality, low quality
Aesthetic tag: Very aesthetic, aesthetic, pleasent, unpleasent

Additional special tag: High resolution, elegant, artist:


| Rating Modifier | Rating Criterion |
| --------------- | ---------------- |
| -               | general          |
| -               | sensitive        |
| nsfw            | questionable     |
| nsfw            | explicit         |

Recommanded prompt order: Rating tag, quality tag, aesthetic tag, (additional tag), general tag

### Dataset Specifications
- Total Images: 172k
- General Training Set: 160k images
- Aesthetic Fine-tuning Set: 12k high-quality images
- Resolution: 1024x1024

### Hardware Configuration
- GPUs: 2x NVIDIA RTX 6000 Ada
- Training Time: 16 days (General), 3 days (Aesthetic fine tune)

### Training Configuration

| Parameter | Value | Description |
|-----------|--------|-------------|
| Resolution | 1024x1024 | Training resolution |
| Batch Size | 8x2x2 | Effective batch size |
| Learning Rate | 5e-5 | Base learning rate |
| Text Encoder LR | 1e-5 | Learning rate for text encoder |
| Epochs | 10 | Total training epochs |
| Mixed Precision | FP16 | Training precision mode |
| Optimizer | AdamW8bit | Optimizer type |

### Advanced Settings

| Feature | Setting | Purpose |
|---------|---------|----------|
| Gradient Checkpointing | Enabled | Memory optimization |
| XFormers | Enabled | Attention optimization |
| Memory Efficient Attention | Enabled | Memory optimization |
| Bucket Resolution Steps | 128 | Dynamic resolution handling |
| Min Bucket Resolution | 512 | Minimum image size |
| Max Bucket Resolution | 4096 | Maximum image size |
| Noise Offset | 0.035 | Training stability |
| Min SNR Gamma | 5 | Signal-to-noise ratio control |