File size: 11,433 Bytes
25da1ee
 
3bbe29b
7a6ce17
25da1ee
3bbe29b
 
 
 
 
 
25da1ee
 
 
 
3bbe29b
25da1ee
 
a2ecb3d
 
c8de8dd
 
a2ecb3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3bbe29b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
---
language:
- en
license: creativeml-openrail-m
tags:
- stable-diffusion
- text-to-image
- diffusers
- pokemon
- image-generation
- art
library_name: diffusers
pipeline_tag: text-to-image
base_model: stable-diffusion-v1-5
datasets:
- reach-vb/pokemon-blip-captions
---

# Pokemon Stable Diffusion v1.5

![Pokemon Samples Grid](sample_images/merged_pokemon_samples_compressed.jpg)

A fine-tuned version of Stable Diffusion v1.5 specifically trained to generate high-quality Pokémon images in various artistic styles.

## Model Details

- **Base Model**: [Stable Diffusion v1.5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5)
- **Developed by**: RekklesAI
- **Model Type**: Latent Diffusion Model for Text-to-Image generation
- **Language(s)**: English
- **License**: [CreativeML OpenRAIL-M](https://huggingface.co/spaces/CompVis/stable-diffusion-license)
- **Training Data**: [reach-vb/pokemon-blip-captions](https://huggingface.co/datasets/reach-vb/pokemon-blip-captions)
- **Training Steps**: 15,000 steps at resolution 512x512
- **Model Architecture**: Same as Stable Diffusion v1.5 (UNet with cross-attention layers)
- **Diffusers Version**: 0.33.0.dev0
- **Scheduler**: PNDMScheduler
- **Safety Checker**: StableDiffusionSafetyChecker (can be disabled during inference)

## Model Description

This model is a fine-tuned version of Stable Diffusion v1.5, specifically trained to generate high-quality Pokémon images. It can produce Pokémon in various artistic styles, from photorealistic renders to cartoon styles, cyberpunk aesthetics to watercolor art.

The model was fine-tuned for 15,000 steps on the [reach-vb/pokemon-blip-captions](https://huggingface.co/datasets/reach-vb/pokemon-blip-captions) dataset, allowing it to learn the distinctive features and characteristics of different Pokémon species while maintaining the generative capabilities of the base model.

### Training Details

The model was trained using the following configuration:
```bash
accelerate launch train_text_to_image.py \
  --pretrained_model_name_or_path="stable-diffusion-v1-5/stable-diffusion-v1-5" \
  --dataset_name="reach-vb/pokemon-blip-captions" \
  --caption_column="text" \
  --image_column="image" \
  --resolution=512 \
  --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=15000 \
  --learning_rate=1e-05 \
  --max_grad_norm=1 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --output_dir="sd-pokemon-model" \
  --report_to tensorboard
```

Key training parameters:
- Learning rate: 1e-5
- Optimizer: AdamW (default)
- LR scheduler: Constant (no decay)
- Batch size: 1 with gradient accumulation steps of 4
- Resolution: 512x512
- Data augmentation: Random horizontal flip

## Sample Images

Below are sample images generated with this model:

### Epic Photorealistic Style

![Epic Charizard](sample_images/epic_charizard.png)
*Prompt: A majestic Charizard with iridescent scales soaring through a dramatic sunset sky, casting long shadows over an ancient volcanic landscape. Intricate details of ember particles floating around its powerful wings, reflecting the golden-crimson light. Hyper-realistic texturing, volumetric lighting, cinematic composition with rule of thirds, depth of field focusing on its determined expression, 8K resolution, photorealistic rendering with ray-traced shadows, award-winning digital art, trending on ArtStation.*

![Epic Rayquaza](sample_images/epic_rayquaza.png)
*Prompt: A majestic Rayquaza with iridescent scales soaring through a dramatic sunset sky, casting long shadows over an ancient cloud kingdom landscape. Intricate details of cosmic energy particles flowing along its serpentine body, reflecting the golden-crimson light. Hyper-realistic texturing, volumetric lighting, cinematic composition with rule of thirds, depth of field focusing on its ancient and wise expression, 8K resolution, photorealistic rendering with ray-traced shadows, award-winning digital art, trending on ArtStation.*

### Cyberpunk Style

![Cyberpunk Mewtwo](sample_images/cyberpunk_mewtwo.png)
*Prompt: A cybernetic Mewtwo floating in a neon-drenched futuristic cityscape at night. Bioluminescent purple energy coursing through transparent tubes connected to its body. Holographic interfaces surrounding it, reflecting on wet asphalt streets. Cyberpunk aesthetic with glowing technological implants, sharp contrasts between shadows and vibrant neon lights. Blade Runner inspired atmosphere, digital distortion effects, lens flares, and electric particle effects. Ultramodern sci-fi concept art with intricate mechanical details.*

### Gothic Style

![Gothic Gengar](sample_images/gothic_gengar.png)
*Prompt: A haunting Gengar lurking in a decrepit Victorian mansion. High contrast black and white photography style with dramatic chiaroscuro lighting. Gothic architecture with ornate details fading into deep shadows. Film noir aesthetic with grainy texture and vignette edges. Eerie atmosphere enhanced by fog tendrils and moonlight streaming through broken stained glass windows. Reminiscent of classic horror cinema, with stark silhouettes and ominous negative space. Timeless monochromatic art with haunting emotional depth.*

### Watercolor Art Style

![Watercolor Gardevoir](sample_images/watercolor_gardevoir.png)
*Prompt: A serene Gardevoir in an enchanted forest glade, surrounded by luminescent butterflies and delicate wildflowers. Soft watercolor style with gentle pastel hues, flowing brushstrokes creating an ethereal atmosphere. Dappled sunlight filtering through the canopy, creating a dreamy bokeh effect. Impressionistic details, emotional color palette with teal and lavender accents, artistic composition inspired by Studio Ghibli, whimsical fantasy illustration.*

### Kawaii/Chibi Style

![Kawaii Eevee](sample_images/kawaii_eevee.png)
*Prompt: An adorable chibi-style Eevee and its evolutions having a tea party in a candy-colored meadow. Kawaii anime style with exaggerated expressions and oversized eyes. Pastel rainbow palette with soft shading and cute decorative elements like hearts and stars. Playful composition with rounded shapes and simplified forms. Cheerful atmosphere with cartoon sparkles and emotion symbols. Inspired by children's animation, with clean outlines and flat color blocks. Whimsical and heartwarming illustration style perfect for merchandise.*

![Grass Pokemon Tea](sample_images/grass_pokemon_tea.png)
*Prompt: A delightful tea party hosted by Bulbasaur, Chikorita, and Rowlet in a blooming flower garden. Cute storybook illustration style with soft rounded shapes. Tiny teacups and miniature pastries served on lily pad tables. Pastel green and pink color scheme with dainty flower patterns. Chibi proportions with oversized heads and stubby limbs. Cheerful expressions with sparkling eyes and happy smiles. Whimsical details like butterfly waiters and ladybug guests. Heartwarming scene rendered in a children's picture book style with gentle outlines and soft textures.*

## Usage

You can use this model with the Diffusers library:

```python
import torch
from diffusers import StableDiffusionPipeline

# Load the model
model_path = "path/to/PokemonStable-v1-5"  # Replace with actual path
pipe = StableDiffusionPipeline.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    safety_checker=None  # Set to None to disable safety checker
)

# Move to GPU if available
if torch.cuda.is_available():
    pipe = pipe.to("cuda")

# Generate image
prompt = "A cute Pikachu playing in a grassy field, high resolution, detailed"
image = pipe(
    prompt=prompt,
    num_inference_steps=50,
    guidance_scale=7.5
).images[0]

# Save image
image.save("generated_pokemon.png")
```

### Advanced Usage with Custom Scheduler

You can also customize the scheduler for different generation qualities:

```python
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch

model_path = "path/to/PokemonStable-v1-5"  # Replace with actual path

# Load model
pipe = StableDiffusionPipeline.from_pretrained(
    model_path, 
    torch_dtype=torch.float16,
    safety_checker=None
)

# Replace scheduler for faster inference
pipe.scheduler = DPMSolverMultistepScheduler.from_config(
    pipe.scheduler.config,
    algorithm_type="dpmsolver++",
    solver_order=2
)

# Move to GPU
pipe = pipe.to("cuda")

# Generate with fewer steps
prompt = "A majestic Charizard in battle stance, fire breathing, detailed scales, epic lighting"
image = pipe(
    prompt=prompt,
    num_inference_steps=25,  # Fewer steps needed with DPM-Solver++
    guidance_scale=7.5
).images[0]

image.save("charizard_dpm_solver.png")
```

## Prompt Engineering Tips

For optimal results, consider the following prompt engineering techniques:

1. **Specify Pokémon Names**: Include specific Pokémon names like "Pikachu", "Bulbasaur", "Charizard", etc.
2. **Add Scene Descriptions**: Describe the environment, such as "in a forest", "in battle", "sleeping", etc.
3. **Include Style Descriptors**: Add style terms like "high resolution", "detailed", "cartoon style", etc.
4. **Emotional Context**: Include emotional states like "happy", "angry", "cute", etc.
5. **Artistic Techniques**: Specify art styles like "watercolor", "oil painting", "digital art", etc.
6. **Lighting and Atmosphere**: Describe lighting conditions like "sunset", "moonlight", "studio lighting", etc.
7. **Composition Guidelines**: Add terms like "rule of thirds", "dynamic pose", "close-up shot", etc.

## Limitations

- The model may occasionally generate Pokémon with anatomical inaccuracies
- Text rendering within images may be illegible or distorted
- Complex compositions with multiple Pokémon may not always position them correctly
- The model performs best with English prompts
- As with all Stable Diffusion models, it inherits certain biases and limitations from the base model
- The safety checker may occasionally filter legitimate content; it can be disabled but use with caution

## Ethical Considerations

This model is intended for creative and artistic purposes only. Users should:

- Respect the intellectual property rights of The Pokémon Company and Nintendo
- Avoid generating harmful, offensive, or inappropriate content
- Not use generated images for commercial purposes without proper licensing
- Be transparent about AI-generated content when sharing

## License

This model is based on Stable Diffusion v1.5 and follows the [CreativeML OpenRAIL-M](https://huggingface.co/spaces/CompVis/stable-diffusion-license) license of the original model.

## Acknowledgements

Thanks to all artists and creators who have contributed to the Pokémon franchise, and to Stability AI for developing the Stable Diffusion model. Special thanks to the creators of the [reach-vb/pokemon-blip-captions](https://huggingface.co/datasets/reach-vb/pokemon-blip-captions) dataset used for training this model.

## Citation

If you use this model in your research, please cite:

```bibtex
@InProceedings{Rombach_2022_CVPR,
    author    = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
    title     = {High-Resolution Image Synthesis With Latent Diffusion Models},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {10684-10695}
}
```