suzushi
/

miso-diffusion-xl-1.0

StableDiffusionXLPipeline

Inference Endpoints

Model card Files Files and versions Community

suzushi commited on Dec 31, 2024

Commit

121cc9f

·

verified ·

1 Parent(s): bf8cac2

Update README.md

Files changed (1) hide show

README.md +63 -2

README.md CHANGED Viewed

@@ -1,10 +1,71 @@
 ---
-license: other
 language:
 - en
 library_name: diffusers
 pipeline_tag: text-to-image
 tags:
 - text-to-image
 ---
-Converted from [https://huggingface.co/suzushi/miso-diffusion-xl-1.0-safetensors/blob/main/miso-diffusion-xl-1.0.safetensors](https://huggingface.co/suzushi/miso-diffusion-xl-1.0-safetensors/blob/main/miso-diffusion-xl-1.0.safetensors).

 ---
+license: openrail++
 language:
 - en
 library_name: diffusers
 pipeline_tag: text-to-image
 tags:
 - text-to-image
+base_model:
+- stabilityai/stable-diffusion-xl-base-1.0
 ---
+# Anime Stable Diffusion Model
+A custom Stable Diffusion model fine-tuned for anime-style image generation, trained on a large dataset of anime images.
+This is the first concept model for the entire series as I am spending more time filtering and processing the
+larger dataset. Currently the model is still undertrained, while it can reflect certain notions, a lot of additional
+improvements need to be done.
+## Prompt
+Danbooru style tagging.
+Quality tag: Masterpiece, high quality, normal quality, low quality
+Aesthetic tag: Very aesthetic, aesthetic, pleasent, unpleasent
+Additional special tag: High resolution, elegant, artist:
+| Rating Modifier | Rating Criterion |
+| --------------- | ---------------- |
+| -               | general          |
+| -               | sensitive        |
+| nsfw            | questionable     |
+| nsfw            | explicit         |
+Recommanded prompt order: Rating tag, quality tag, aesthetic tag, (additional tag), general tag
+### Dataset Specifications
+- Total Images: 172k
+- General Training Set: 160k images
+- Aesthetic Fine-tuning Set: 12k high-quality images
+- Resolution: 1024x1024
+### Hardware Configuration
+- GPUs: 2x NVIDIA RTX 6000 Ada
+- Training Time: 16 days (General), 3 days (Aesthetic fine tune)
+### Training Configuration
+| Parameter | Value | Description |
+|-----------|--------|-------------|
+| Resolution | 1024x1024 | Training resolution |
+| Batch Size | 8x2x2 | Effective batch size |
+| Learning Rate | 5e-5 | Base learning rate |
+| Text Encoder LR | 1e-5 | Learning rate for text encoder |
+| Epochs | 10 | Total training epochs |
+| Mixed Precision | FP16 | Training precision mode |
+| Optimizer | AdamW8bit | Optimizer type |
+### Advanced Settings
+| Feature | Setting | Purpose |
+|---------|---------|----------|
+| Gradient Checkpointing | Enabled | Memory optimization |
+| XFormers | Enabled | Attention optimization |
+| Memory Efficient Attention | Enabled | Memory optimization |
+| Bucket Resolution Steps | 128 | Dynamic resolution handling |
+| Min Bucket Resolution | 512 | Minimum image size |
+| Max Bucket Resolution | 4096 | Maximum image size |
+| Noise Offset | 0.035 | Training stability |
+| Min SNR Gamma | 5 | Signal-to-noise ratio control |