File size: 11,433 Bytes
25da1ee 3bbe29b 7a6ce17 25da1ee 3bbe29b 25da1ee 3bbe29b 25da1ee a2ecb3d c8de8dd a2ecb3d 3bbe29b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
---
language:
- en
license: creativeml-openrail-m
tags:
- stable-diffusion
- text-to-image
- diffusers
- pokemon
- image-generation
- art
library_name: diffusers
pipeline_tag: text-to-image
base_model: stable-diffusion-v1-5
datasets:
- reach-vb/pokemon-blip-captions
---
# Pokemon Stable Diffusion v1.5

A fine-tuned version of Stable Diffusion v1.5 specifically trained to generate high-quality Pokémon images in various artistic styles.
## Model Details
- **Base Model**: [Stable Diffusion v1.5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5)
- **Developed by**: RekklesAI
- **Model Type**: Latent Diffusion Model for Text-to-Image generation
- **Language(s)**: English
- **License**: [CreativeML OpenRAIL-M](https://huggingface.co/spaces/CompVis/stable-diffusion-license)
- **Training Data**: [reach-vb/pokemon-blip-captions](https://huggingface.co/datasets/reach-vb/pokemon-blip-captions)
- **Training Steps**: 15,000 steps at resolution 512x512
- **Model Architecture**: Same as Stable Diffusion v1.5 (UNet with cross-attention layers)
- **Diffusers Version**: 0.33.0.dev0
- **Scheduler**: PNDMScheduler
- **Safety Checker**: StableDiffusionSafetyChecker (can be disabled during inference)
## Model Description
This model is a fine-tuned version of Stable Diffusion v1.5, specifically trained to generate high-quality Pokémon images. It can produce Pokémon in various artistic styles, from photorealistic renders to cartoon styles, cyberpunk aesthetics to watercolor art.
The model was fine-tuned for 15,000 steps on the [reach-vb/pokemon-blip-captions](https://huggingface.co/datasets/reach-vb/pokemon-blip-captions) dataset, allowing it to learn the distinctive features and characteristics of different Pokémon species while maintaining the generative capabilities of the base model.
### Training Details
The model was trained using the following configuration:
```bash
accelerate launch train_text_to_image.py \
--pretrained_model_name_or_path="stable-diffusion-v1-5/stable-diffusion-v1-5" \
--dataset_name="reach-vb/pokemon-blip-captions" \
--caption_column="text" \
--image_column="image" \
--resolution=512 \
--random_flip \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--max_train_steps=15000 \
--learning_rate=1e-05 \
--max_grad_norm=1 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--output_dir="sd-pokemon-model" \
--report_to tensorboard
```
Key training parameters:
- Learning rate: 1e-5
- Optimizer: AdamW (default)
- LR scheduler: Constant (no decay)
- Batch size: 1 with gradient accumulation steps of 4
- Resolution: 512x512
- Data augmentation: Random horizontal flip
## Sample Images
Below are sample images generated with this model:
### Epic Photorealistic Style

*Prompt: A majestic Charizard with iridescent scales soaring through a dramatic sunset sky, casting long shadows over an ancient volcanic landscape. Intricate details of ember particles floating around its powerful wings, reflecting the golden-crimson light. Hyper-realistic texturing, volumetric lighting, cinematic composition with rule of thirds, depth of field focusing on its determined expression, 8K resolution, photorealistic rendering with ray-traced shadows, award-winning digital art, trending on ArtStation.*

*Prompt: A majestic Rayquaza with iridescent scales soaring through a dramatic sunset sky, casting long shadows over an ancient cloud kingdom landscape. Intricate details of cosmic energy particles flowing along its serpentine body, reflecting the golden-crimson light. Hyper-realistic texturing, volumetric lighting, cinematic composition with rule of thirds, depth of field focusing on its ancient and wise expression, 8K resolution, photorealistic rendering with ray-traced shadows, award-winning digital art, trending on ArtStation.*
### Cyberpunk Style

*Prompt: A cybernetic Mewtwo floating in a neon-drenched futuristic cityscape at night. Bioluminescent purple energy coursing through transparent tubes connected to its body. Holographic interfaces surrounding it, reflecting on wet asphalt streets. Cyberpunk aesthetic with glowing technological implants, sharp contrasts between shadows and vibrant neon lights. Blade Runner inspired atmosphere, digital distortion effects, lens flares, and electric particle effects. Ultramodern sci-fi concept art with intricate mechanical details.*
### Gothic Style

*Prompt: A haunting Gengar lurking in a decrepit Victorian mansion. High contrast black and white photography style with dramatic chiaroscuro lighting. Gothic architecture with ornate details fading into deep shadows. Film noir aesthetic with grainy texture and vignette edges. Eerie atmosphere enhanced by fog tendrils and moonlight streaming through broken stained glass windows. Reminiscent of classic horror cinema, with stark silhouettes and ominous negative space. Timeless monochromatic art with haunting emotional depth.*
### Watercolor Art Style

*Prompt: A serene Gardevoir in an enchanted forest glade, surrounded by luminescent butterflies and delicate wildflowers. Soft watercolor style with gentle pastel hues, flowing brushstrokes creating an ethereal atmosphere. Dappled sunlight filtering through the canopy, creating a dreamy bokeh effect. Impressionistic details, emotional color palette with teal and lavender accents, artistic composition inspired by Studio Ghibli, whimsical fantasy illustration.*
### Kawaii/Chibi Style

*Prompt: An adorable chibi-style Eevee and its evolutions having a tea party in a candy-colored meadow. Kawaii anime style with exaggerated expressions and oversized eyes. Pastel rainbow palette with soft shading and cute decorative elements like hearts and stars. Playful composition with rounded shapes and simplified forms. Cheerful atmosphere with cartoon sparkles and emotion symbols. Inspired by children's animation, with clean outlines and flat color blocks. Whimsical and heartwarming illustration style perfect for merchandise.*

*Prompt: A delightful tea party hosted by Bulbasaur, Chikorita, and Rowlet in a blooming flower garden. Cute storybook illustration style with soft rounded shapes. Tiny teacups and miniature pastries served on lily pad tables. Pastel green and pink color scheme with dainty flower patterns. Chibi proportions with oversized heads and stubby limbs. Cheerful expressions with sparkling eyes and happy smiles. Whimsical details like butterfly waiters and ladybug guests. Heartwarming scene rendered in a children's picture book style with gentle outlines and soft textures.*
## Usage
You can use this model with the Diffusers library:
```python
import torch
from diffusers import StableDiffusionPipeline
# Load the model
model_path = "path/to/PokemonStable-v1-5" # Replace with actual path
pipe = StableDiffusionPipeline.from_pretrained(
model_path,
torch_dtype=torch.float16,
safety_checker=None # Set to None to disable safety checker
)
# Move to GPU if available
if torch.cuda.is_available():
pipe = pipe.to("cuda")
# Generate image
prompt = "A cute Pikachu playing in a grassy field, high resolution, detailed"
image = pipe(
prompt=prompt,
num_inference_steps=50,
guidance_scale=7.5
).images[0]
# Save image
image.save("generated_pokemon.png")
```
### Advanced Usage with Custom Scheduler
You can also customize the scheduler for different generation qualities:
```python
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch
model_path = "path/to/PokemonStable-v1-5" # Replace with actual path
# Load model
pipe = StableDiffusionPipeline.from_pretrained(
model_path,
torch_dtype=torch.float16,
safety_checker=None
)
# Replace scheduler for faster inference
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
pipe.scheduler.config,
algorithm_type="dpmsolver++",
solver_order=2
)
# Move to GPU
pipe = pipe.to("cuda")
# Generate with fewer steps
prompt = "A majestic Charizard in battle stance, fire breathing, detailed scales, epic lighting"
image = pipe(
prompt=prompt,
num_inference_steps=25, # Fewer steps needed with DPM-Solver++
guidance_scale=7.5
).images[0]
image.save("charizard_dpm_solver.png")
```
## Prompt Engineering Tips
For optimal results, consider the following prompt engineering techniques:
1. **Specify Pokémon Names**: Include specific Pokémon names like "Pikachu", "Bulbasaur", "Charizard", etc.
2. **Add Scene Descriptions**: Describe the environment, such as "in a forest", "in battle", "sleeping", etc.
3. **Include Style Descriptors**: Add style terms like "high resolution", "detailed", "cartoon style", etc.
4. **Emotional Context**: Include emotional states like "happy", "angry", "cute", etc.
5. **Artistic Techniques**: Specify art styles like "watercolor", "oil painting", "digital art", etc.
6. **Lighting and Atmosphere**: Describe lighting conditions like "sunset", "moonlight", "studio lighting", etc.
7. **Composition Guidelines**: Add terms like "rule of thirds", "dynamic pose", "close-up shot", etc.
## Limitations
- The model may occasionally generate Pokémon with anatomical inaccuracies
- Text rendering within images may be illegible or distorted
- Complex compositions with multiple Pokémon may not always position them correctly
- The model performs best with English prompts
- As with all Stable Diffusion models, it inherits certain biases and limitations from the base model
- The safety checker may occasionally filter legitimate content; it can be disabled but use with caution
## Ethical Considerations
This model is intended for creative and artistic purposes only. Users should:
- Respect the intellectual property rights of The Pokémon Company and Nintendo
- Avoid generating harmful, offensive, or inappropriate content
- Not use generated images for commercial purposes without proper licensing
- Be transparent about AI-generated content when sharing
## License
This model is based on Stable Diffusion v1.5 and follows the [CreativeML OpenRAIL-M](https://huggingface.co/spaces/CompVis/stable-diffusion-license) license of the original model.
## Acknowledgements
Thanks to all artists and creators who have contributed to the Pokémon franchise, and to Stability AI for developing the Stable Diffusion model. Special thanks to the creators of the [reach-vb/pokemon-blip-captions](https://huggingface.co/datasets/reach-vb/pokemon-blip-captions) dataset used for training this model.
## Citation
If you use this model in your research, please cite:
```bibtex
@InProceedings{Rombach_2022_CVPR,
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
title = {High-Resolution Image Synthesis With Latent Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {10684-10695}
}
``` |