YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Stable Diffusion 3 Inpainting Pipeline

This is the implementation of Stable Diffusion 3 Inpainting Pipeline.

input image input mask image output

Please ensure that the version of diffusers >= 0.29.1

Model

Stable Diffusion 3 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.

For more technical details, please refer to the Research paper.

Please note: this model is released under the Stability Non-Commercial Research Community License. For a Creator License or an Enterprise License visit Stability.ai or contact us for commercial licensing details.

Model Description

  • Developed by: Stability AI
  • Model type: MMDiT text-to-image generative model
  • Model Description: This is a model that can be used to generate images based on text prompts. It is a Multimodal Diffusion Transformer (https://arxiv.org/abs/2403.03206) that uses three fixed, pretrained text encoders (OpenCLIP-ViT/G, CLIP-ViT/L and T5-xxl)

Demo

Make sure you upgrade to the latest version of diffusers: pip install -U diffusers. And then you can run:

import torch
from torchvision import transforms

from pipeline_stable_diffusion_3_inpaint import StableDiffusion3InpaintPipeline
from diffusers.utils import load_image

def preprocess_image(image):
    image = image.convert("RGB")
    image = transforms.CenterCrop((image.size[1] // 64 * 64, image.size[0] // 64 * 64))(image)
    image = transforms.ToTensor()(image)
    image = image.unsqueeze(0).to("cuda")
    return image

def preprocess_mask(mask):
    mask = mask.convert("L")
    mask = transforms.CenterCrop((mask.size[1] // 64 * 64, mask.size[0] // 64 * 64))(mask)
    mask = transforms.ToTensor()(mask)
    mask = mask.to("cuda")
    return mask

pipe = StableDiffusion3InpaintPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    torch_dtype=torch.float16,
).to("cuda")

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
source_image = load_image(
    "./overture-creations-5sI6fQgYIuo.png"
)
source = preprocess_image(source_image)
mask = preprocess_mask(
    load_image(
        "./overture-creations-5sI6fQgYIuo_mask.png"
    )
)

image = pipe(
    prompt=prompt,
    image=source,
    mask_image=mask,
    height=1024,
    width=1024,
    num_inference_steps=50,
    guidance_scale=7.0,
    strength=0.6,
).images[0]

image.save("overture-creations-5sI6fQgYIuo_output.jpg")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.