|
--- |
|
license: openrail++ |
|
language: |
|
- en |
|
library_name: diffusers |
|
pipeline_tag: text-to-image |
|
tags: |
|
- text-to-image |
|
base_model: |
|
- stabilityai/stable-diffusion-xl-base-1.0 |
|
--- |
|
|
|
# Anime Stable Diffusion Model |
|
|
|
A custom Stable Diffusion model fine-tuned for anime-style image generation, trained on a large dataset of anime images. |
|
This is the first concept model for the entire series as I am spending more time filtering and processing the |
|
larger dataset. Currently the model is still undertrained, while it can reflect certain notions, a lot of additional |
|
improvements need to be done. |
|
|
|
## Prompt |
|
Danbooru style tagging. |
|
|
|
Quality tag: Masterpiece, high quality, normal quality, low quality |
|
Aesthetic tag: Very aesthetic, aesthetic, pleasent, unpleasent |
|
|
|
Additional special tag: High resolution, elegant, artist: |
|
|
|
|
|
| Rating Modifier | Rating Criterion | |
|
| --------------- | ---------------- | |
|
| - | general | |
|
| - | sensitive | |
|
| nsfw | questionable | |
|
| nsfw | explicit | |
|
|
|
Recommanded prompt order: Rating tag, quality tag, aesthetic tag, (additional tag), general tag |
|
|
|
### Dataset Specifications |
|
- Total Images: 172k |
|
- General Training Set: 160k images |
|
- Aesthetic Fine-tuning Set: 12k high-quality images |
|
- Resolution: 1024x1024 |
|
|
|
### Hardware Configuration |
|
- GPUs: 2x NVIDIA RTX 6000 Ada |
|
- Training Time: 16 days (General), 3 days (Aesthetic fine tune) |
|
|
|
### Training Configuration |
|
|
|
| Parameter | Value | Description | |
|
|-----------|--------|-------------| |
|
| Resolution | 1024x1024 | Training resolution | |
|
| Batch Size | 8x2x2 | Effective batch size | |
|
| Learning Rate | 5e-5 | Base learning rate | |
|
| Text Encoder LR | 1e-5 | Learning rate for text encoder | |
|
| Epochs | 10 | Total training epochs | |
|
| Mixed Precision | FP16 | Training precision mode | |
|
| Optimizer | AdamW8bit | Optimizer type | |
|
|
|
### Advanced Settings |
|
|
|
| Feature | Setting | Purpose | |
|
|---------|---------|----------| |
|
| Gradient Checkpointing | Enabled | Memory optimization | |
|
| XFormers | Enabled | Attention optimization | |
|
| Memory Efficient Attention | Enabled | Memory optimization | |
|
| Bucket Resolution Steps | 128 | Dynamic resolution handling | |
|
| Min Bucket Resolution | 512 | Minimum image size | |
|
| Max Bucket Resolution | 4096 | Maximum image size | |
|
| Noise Offset | 0.035 | Training stability | |
|
| Min SNR Gamma | 5 | Signal-to-noise ratio control | |
|
|