File size: 2,402 Bytes

cc4c423
121cc9f
cc4c423
 
 
 
 
 
121cc9f
 
cc4c423
121cc9f

---
license: openrail++
language:
- en
library_name: diffusers
pipeline_tag: text-to-image
tags:
- text-to-image
base_model:
- stabilityai/stable-diffusion-xl-base-1.0
---

# Anime Stable Diffusion Model

A custom Stable Diffusion model fine-tuned for anime-style image generation, trained on a large dataset of anime images.
This is the first concept model for the entire series as I am spending more time filtering and processing the 
larger dataset. Currently the model is still undertrained, while it can reflect certain notions, a lot of additional 
improvements need to be done. 

## Prompt 
Danbooru style tagging. 

Quality tag: Masterpiece, high quality, normal quality, low quality
Aesthetic tag: Very aesthetic, aesthetic, pleasent, unpleasent

Additional special tag: High resolution, elegant, artist:


| Rating Modifier | Rating Criterion |
| --------------- | ---------------- |
| -               | general          |
| -               | sensitive        |
| nsfw            | questionable     |
| nsfw            | explicit         |

Recommanded prompt order: Rating tag, quality tag, aesthetic tag, (additional tag), general tag

### Dataset Specifications
- Total Images: 172k
- General Training Set: 160k images
- Aesthetic Fine-tuning Set: 12k high-quality images
- Resolution: 1024x1024

### Hardware Configuration
- GPUs: 2x NVIDIA RTX 6000 Ada
- Training Time: 16 days (General), 3 days (Aesthetic fine tune)

### Training Configuration

| Parameter | Value | Description |
|-----------|--------|-------------|
| Resolution | 1024x1024 | Training resolution |
| Batch Size | 8x2x2 | Effective batch size |
| Learning Rate | 5e-5 | Base learning rate |
| Text Encoder LR | 1e-5 | Learning rate for text encoder |
| Epochs | 10 | Total training epochs |
| Mixed Precision | FP16 | Training precision mode |
| Optimizer | AdamW8bit | Optimizer type |

### Advanced Settings

| Feature | Setting | Purpose |
|---------|---------|----------|
| Gradient Checkpointing | Enabled | Memory optimization |
| XFormers | Enabled | Attention optimization |
| Memory Efficient Attention | Enabled | Memory optimization |
| Bucket Resolution Steps | 128 | Dynamic resolution handling |
| Min Bucket Resolution | 512 | Minimum image size |
| Max Bucket Resolution | 4096 | Maximum image size |
| Noise Offset | 0.035 | Training stability |
| Min SNR Gamma | 5 | Signal-to-noise ratio control |