suzushi
/

miso-diffusion-xl-1.0

StableDiffusionXLPipeline

Inference Endpoints

Model card Files Files and versions Community

miso-diffusion-xl-1.0 / README.md

suzushi's picture

Update README.md

121cc9f verified 3 months ago

|

history blame contribute delete

2.4 kB

	---
	license: openrail++
	language:
	- en
	library_name: diffusers
	pipeline_tag: text-to-image
	tags:
	- text-to-image
	base_model:
	- stabilityai/stable-diffusion-xl-base-1.0
	---

	# Anime Stable Diffusion Model

	A custom Stable Diffusion model fine-tuned for anime-style image generation, trained on a large dataset of anime images.
	This is the first concept model for the entire series as I am spending more time filtering and processing the
	larger dataset. Currently the model is still undertrained, while it can reflect certain notions, a lot of additional
	improvements need to be done.

	## Prompt
	Danbooru style tagging.

	Quality tag: Masterpiece, high quality, normal quality, low quality
	Aesthetic tag: Very aesthetic, aesthetic, pleasent, unpleasent

	Additional special tag: High resolution, elegant, artist:


	\| Rating Modifier \| Rating Criterion \|
	\| --------------- \| ---------------- \|
	\| - \| general \|
	\| - \| sensitive \|
	\| nsfw \| questionable \|
	\| nsfw \| explicit \|

	Recommanded prompt order: Rating tag, quality tag, aesthetic tag, (additional tag), general tag

	### Dataset Specifications
	- Total Images: 172k
	- General Training Set: 160k images
	- Aesthetic Fine-tuning Set: 12k high-quality images
	- Resolution: 1024x1024

	### Hardware Configuration
	- GPUs: 2x NVIDIA RTX 6000 Ada
	- Training Time: 16 days (General), 3 days (Aesthetic fine tune)

	### Training Configuration

	\| Parameter \| Value \| Description \|
	\|-----------\|--------\|-------------\|
	\| Resolution \| 1024x1024 \| Training resolution \|
	\| Batch Size \| 8x2x2 \| Effective batch size \|
	\| Learning Rate \| 5e-5 \| Base learning rate \|
	\| Text Encoder LR \| 1e-5 \| Learning rate for text encoder \|
	\| Epochs \| 10 \| Total training epochs \|
	\| Mixed Precision \| FP16 \| Training precision mode \|
	\| Optimizer \| AdamW8bit \| Optimizer type \|

	### Advanced Settings

	\| Feature \| Setting \| Purpose \|
	\|---------\|---------\|----------\|
	\| Gradient Checkpointing \| Enabled \| Memory optimization \|
	\| XFormers \| Enabled \| Attention optimization \|
	\| Memory Efficient Attention \| Enabled \| Memory optimization \|
	\| Bucket Resolution Steps \| 128 \| Dynamic resolution handling \|
	\| Min Bucket Resolution \| 512 \| Minimum image size \|
	\| Max Bucket Resolution \| 4096 \| Maximum image size \|
	\| Noise Offset \| 0.035 \| Training stability \|
	\| Min SNR Gamma \| 5 \| Signal-to-noise ratio control \|