{ "cells": [ { "cell_type": "markdown", "id": "43976805", "metadata": {}, "source": [ "# Inpainting with ControlNet\n", "This notebook contains examples of using a new `StableDiffusionControlNetInpaintPipeline`.\n", "\n", "The main two parameters you can play with are the strength of text guidance and image guidance:\n", "* Text guidance (`guidance_scale`) is set to `7.5` by default, and usually this value works quite well.\n", "* Image guidance (`controlnet_conditioning_scale`) is set to `0.4` by default. This value is a good starting point, but can be lowered if there is a big misalignment between the text prompt and the control image (meaning that it is very hard to \"imagine\" an output image that both satisfies the text prompt and aligns with the control image).\n", "\n", "The naming of these parameters is based on other pipelines `StableDiffusionInpaintPipeline` and `StableDiffusionControlNetPipeline` and the same convention has been preserved for consistency." ] }, { "cell_type": "code", "execution_count": null, "id": "33c2f672", "metadata": {}, "outputs": [], "source": [ "from diffusers import StableDiffusionInpaintPipeline, ControlNetModel, UniPCMultistepScheduler\n", "from src.pipeline_stable_diffusion_controlnet_inpaint import *\n", "from diffusers.utils import load_image\n", "\n", "import cv2\n", "from PIL import Image\n", "import numpy as np\n", "import torch\n", "from matplotlib import pyplot as plt" ] }, { "cell_type": "markdown", "id": "cb869cff", "metadata": {}, "source": [ "### Baseline: Stable Diffusion 1.5 Inpainting\n", "The StableDiffusion1.5 Inpainting model is used as the core for ControlNet inpainting. For reference, you can also try to run the same results on this core model alone:" ] }, { "cell_type": "code", "execution_count": null, "id": "f011126d", "metadata": {}, "outputs": [], "source": [ "pipe_sd = StableDiffusionInpaintPipeline.from_pretrained(\n", " \"runwayml/stable-diffusion-inpainting\",\n", " revision=\"fp16\",\n", " torch_dtype=torch.float16,\n", ")\n", "# speed up diffusion process with faster scheduler and memory optimization\n", "pipe_sd.scheduler = UniPCMultistepScheduler.from_config(pipe_sd.scheduler.config)\n", "# remove following line if xformers is not installed\n", "pipe_sd.enable_xformers_memory_efficient_attention()\n", "\n", "pipe_sd.to('cuda')" ] }, { "cell_type": "markdown", "id": "a4d89ea7", "metadata": {}, "source": [ "### Task\n", "Let's start by turning this dog into a red panda using various types of guidance!\n", "\n", "All we need is an `image`, a `mask`, and a `text_prompt` of **\"a red panda sitting on a bench\"**" ] }, { "cell_type": "code", "execution_count": null, "id": "517add62", "metadata": {}, "outputs": [], "source": [ "# download an image\n", "image = load_image(\n", " \"https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png\"\n", ")\n", "image = np.array(image)\n", "mask_image = load_image(\n", " \"https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png\"\n", ")\n", "mask_image = np.array(mask_image)\n", "\n", "text_prompt=\"a red panda sitting on a bench\"\n", "\n", "plt.figure(figsize=(12,4))\n", "\n", "plt.subplot(1,2,1)\n", "plt.imshow(image)\n", "plt.axis('off')\n", "plt.title('Input')\n", "plt.subplot(1,2,2)\n", "plt.imshow((255-np.array(image))*(255-np.array(mask_image)))\n", "plt.axis('off')\n", "plt.title('Masked')" ] }, { "cell_type": "markdown", "id": "489f2543", "metadata": {}, "source": [ "## Canny Edge" ] }, { "cell_type": "code", "execution_count": null, "id": "906b2654", "metadata": {}, "outputs": [], "source": [ "# get canny image\n", "canny_image = cv2.Canny(image, 100, 200)\n", "canny_image = canny_image[:, :, None]\n", "canny_image = np.concatenate([canny_image, canny_image, canny_image], axis=2)\n", "\n", "image=Image.fromarray(image)\n", "mask_image=Image.fromarray(mask_image)\n", "canny_image = Image.fromarray(canny_image)\n", "\n", "canny_image" ] }, { "cell_type": "code", "execution_count": null, "id": "41d35b98", "metadata": {}, "outputs": [], "source": [ "# load control net and stable diffusion v1-5\n", "controlnet = ControlNetModel.from_pretrained(\"lllyasviel/sd-controlnet-canny\", torch_dtype=torch.float16)\n", "pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(\n", " \"runwayml/stable-diffusion-inpainting\", controlnet=controlnet, torch_dtype=torch.float16\n", " )\n", "\n", "# speed up diffusion process with faster scheduler and memory optimization\n", "pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)\n", "# remove following line if xformers is not installed\n", "pipe.enable_xformers_memory_efficient_attention()" ] }, { "cell_type": "markdown", "id": "d6146702", "metadata": {}, "source": [ "### Scaling image control...\n", "In this example, `canny_image` input is actually quite hard to satisfy with the our text prompt due to a lot of local noise. In this special case, we adjust `controlnet_conditioning_scale` to `0.5` to make this guidance more subtle.\n", "\n", "In all other examples, the default value of `controlnet_conditioning_scale` = `1.0` works rather well!" ] }, { "cell_type": "code", "execution_count": null, "id": "e5069621", "metadata": {}, "outputs": [], "source": [ "pipe.to('cuda')\n", "\n", "# generate image\n", "generator = torch.manual_seed(0)\n", "new_image = pipe(\n", " text_prompt,\n", " num_inference_steps=20,\n", " generator=generator,\n", " image=image,\n", " control_image=canny_image,\n", " controlnet_conditioning_scale = 0.5,\n", " mask_image=mask_image\n", ").images[0]\n", "\n", "new_image.save('output/canny_result.png')" ] }, { "cell_type": "code", "execution_count": null, "id": "2f9c6ff6", "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(12,4))\n", "\n", "plt.subplot(1,4,1)\n", "plt.imshow(image)\n", "plt.axis('off')\n", "plt.title('Input')\n", "plt.subplot(1,4,2)\n", "plt.imshow((255-np.array(image))*(255-np.array(mask_image)))\n", "plt.axis('off')\n", "plt.title('Masked')\n", "plt.subplot(1,4,3)\n", "plt.imshow(canny_image)\n", "plt.axis('off')\n", "plt.title('Condition')\n", "plt.subplot(1,4,4)\n", "plt.imshow(new_image)\n", "plt.title('Output')\n", "plt.axis('off')\n", "\n", "plt.savefig('output/canny_grid.png',\n", " dpi=200,\n", " bbox_inches='tight',\n", " pad_inches=0.0\n", " )" ] }, { "cell_type": "markdown", "id": "87de0502", "metadata": {}, "source": [ "### Comparison: vanilla inpainting from StableDiffusion1.5" ] }, { "cell_type": "code", "execution_count": null, "id": "f2ef71fe", "metadata": {}, "outputs": [], "source": [ "# generate image\n", "generator = torch.manual_seed(0)\n", "new_image = pipe_sd(\n", " text_prompt,\n", " num_inference_steps=20,\n", " generator=generator,\n", " image=image,\n", " mask_image=mask_image\n", ").images[0]\n", "\n", "new_image.save('output/baseline_result.png')" ] }, { "cell_type": "code", "execution_count": null, "id": "e09513c8", "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(12,4))\n", "\n", "plt.subplot(1,3,1)\n", "plt.imshow(image)\n", "plt.axis('off')\n", "plt.title('Input')\n", "plt.subplot(1,3,2)\n", "plt.imshow((255-np.array(image))*(255-np.array(mask_image)))\n", "plt.axis('off')\n", "plt.title('Masked')\n", "plt.subplot(1,3,3)\n", "plt.imshow(new_image)\n", "plt.title('Output')\n", "plt.axis('off')\n", "\n", "plt.savefig('output/baseline_grid.png',\n", " dpi=200,\n", " bbox_inches='tight',\n", " pad_inches=0.0\n", " )" ] }, { "cell_type": "markdown", "id": "0569600e", "metadata": {}, "source": [ "## Challenging Examples 🐕➡️🍔\n", "Let's see how tuning the `controlnet_conditioning_scale` works out for a more challenging example of turning the dog into a cheeseburger!\n", "\n", "In this case, we **demand a large semantic leap** and that requires a more subtle guide from the control image!" ] }, { "cell_type": "code", "execution_count": null, "id": "69a352a4", "metadata": {}, "outputs": [], "source": [ "difficult_text_prompt=\"a big cheeseburger sitting on a bench\"" ] }, { "cell_type": "code", "execution_count": null, "id": "0803c982", "metadata": {}, "outputs": [], "source": [ "# First - StableDiffusion1.5 baseline (no ControlNet)\n", "\n", "# generate image\n", "generator = torch.manual_seed(0)\n", "new_image = pipe_sd(\n", " difficult_text_prompt,\n", " num_inference_steps=20,\n", " generator=generator,\n", " image=image,\n", " mask_image=mask_image\n", ").images[0]\n", "\n", "sd_output=new_image\n", "sd_output" ] }, { "cell_type": "code", "execution_count": null, "id": "319b867e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "89dbb557", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "d6d74fdd", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "bdaa2483", "metadata": {}, "source": [ "## HED" ] }, { "cell_type": "code", "execution_count": null, "id": "1c5f1ead", "metadata": {}, "outputs": [], "source": [ "from controlnet_aux import HEDdetector\n", "\n", "hed = HEDdetector.from_pretrained('lllyasviel/ControlNet')\n", "\n", "hed_image = hed(image)" ] }, { "cell_type": "code", "execution_count": null, "id": "192a9881", "metadata": {}, "outputs": [], "source": [ "controlnet = ControlNetModel.from_pretrained(\n", " \"fusing/stable-diffusion-v1-5-controlnet-hed\", torch_dtype=torch.float16\n", ")\n", "pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(\n", " \"runwayml/stable-diffusion-inpainting\", controlnet=controlnet, torch_dtype=torch.float16\n", " )\n", "\n", "pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)\n", "\n", "# Remove if you do not have xformers installed\n", "# see https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/xformers#installing-xformers\n", "# for installation instructions\n", "pipe.enable_xformers_memory_efficient_attention()" ] }, { "cell_type": "code", "execution_count": null, "id": "aa054f4e", "metadata": {}, "outputs": [], "source": [ "pipe.to('cuda')\n", "\n", "# generate image\n", "generator = torch.manual_seed(0)\n", "new_image = pipe(\n", " text_prompt,\n", " num_inference_steps=20,\n", " generator=generator,\n", " image=image,\n", " control_image=hed_image,\n", " mask_image=mask_image\n", ").images[0]\n", "\n", "new_image.save('output/hed_result.png')" ] }, { "cell_type": "code", "execution_count": null, "id": "cc33ddfa", "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(12,4))\n", "\n", "plt.subplot(1,4,1)\n", "plt.imshow(image)\n", "plt.axis('off')\n", "plt.title('Input')\n", "plt.subplot(1,4,2)\n", "plt.imshow((255-np.array(image))*(255-np.array(mask_image)))\n", "plt.axis('off')\n", "plt.title('Masked')\n", "plt.subplot(1,4,3)\n", "plt.imshow(hed_image)\n", "plt.axis('off')\n", "plt.title('Condition')\n", "plt.subplot(1,4,4)\n", "plt.imshow(new_image)\n", "plt.title('Output')\n", "plt.axis('off')\n", "\n", "plt.savefig('output/hed_grid.png',\n", " dpi=200,\n", " bbox_inches='tight',\n", " pad_inches=0.0\n", " )" ] }, { "cell_type": "markdown", "id": "1be22a64", "metadata": {}, "source": [ "### Scribble" ] }, { "cell_type": "code", "execution_count": null, "id": "e4b376bb", "metadata": {}, "outputs": [], "source": [ "from controlnet_aux import HEDdetector\n", "\n", "hed = HEDdetector.from_pretrained('lllyasviel/ControlNet')\n", "\n", "scribble_image = hed(image,scribble=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "b0c63b8f", "metadata": {}, "outputs": [], "source": [ "controlnet = ControlNetModel.from_pretrained(\n", " \"fusing/stable-diffusion-v1-5-controlnet-scribble\", torch_dtype=torch.float16\n", ")\n", "pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(\n", " \"runwayml/stable-diffusion-inpainting\", controlnet=controlnet, torch_dtype=torch.float16\n", " )\n", "\n", "pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)\n", "\n", "# Remove if you do not have xformers installed\n", "# see https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/xformers#installing-xformers\n", "# for installation instructions\n", "pipe.enable_xformers_memory_efficient_attention()\n", "\n", "#pipe.enable_model_cpu_offload()" ] }, { "cell_type": "code", "execution_count": null, "id": "f30189e0", "metadata": {}, "outputs": [], "source": [ "pipe.to('cuda')\n", "\n", "# generate image\n", "generator = torch.manual_seed(0)\n", "new_image = pipe(\n", " text_prompt,\n", " num_inference_steps=20,\n", " generator=generator,\n", " image=image,\n", " control_image=scribble_image,\n", " mask_image=mask_image\n", ").images[0]\n", "\n", "new_image.save('output/scribble_result.png')" ] }, { "cell_type": "code", "execution_count": null, "id": "8de59fe6", "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(12,4))\n", "\n", "plt.subplot(1,4,1)\n", "plt.imshow(image)\n", "plt.axis('off')\n", "plt.title('Input')\n", "plt.subplot(1,4,2)\n", "plt.imshow((255-np.array(image))*(255-np.array(mask_image)))\n", "plt.axis('off')\n", "plt.title('Masked')\n", "plt.subplot(1,4,3)\n", "plt.imshow(scribble_image)\n", "plt.axis('off')\n", "plt.title('Condition')\n", "plt.subplot(1,4,4)\n", "plt.imshow(new_image)\n", "plt.title('Output')\n", "plt.axis('off')\n", "\n", "plt.savefig('output/scribble_grid.png',\n", " dpi=200,\n", " bbox_inches='tight',\n", " pad_inches=0.0\n", " )" ] }, { "cell_type": "markdown", "id": "e30c6ce2", "metadata": {}, "source": [ "### Depth" ] }, { "cell_type": "code", "execution_count": null, "id": "f681c4d6", "metadata": {}, "outputs": [], "source": [ "from transformers import pipeline\n", "\n", "depth_estimator = pipeline('depth-estimation')\n", "\n", "depth_image = depth_estimator(image)['depth']\n", "depth_image = np.array(depth_image)\n", "depth_image = depth_image[:, :, None]\n", "depth_image = np.concatenate(3*[depth_image], axis=2)\n", "depth_image = Image.fromarray(depth_image)" ] }, { "cell_type": "code", "execution_count": null, "id": "3f8fdcf5", "metadata": {}, "outputs": [], "source": [ "controlnet = ControlNetModel.from_pretrained(\n", " \"fusing/stable-diffusion-v1-5-controlnet-depth\", torch_dtype=torch.float16\n", ")\n", "pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(\n", " \"runwayml/stable-diffusion-inpainting\", controlnet=controlnet, torch_dtype=torch.float16\n", " )\n", "\n", "pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)\n", "\n", "# Remove if you do not have xformers installed\n", "# see https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/xformers#installing-xformers\n", "# for installation instructions\n", "pipe.enable_xformers_memory_efficient_attention()" ] }, { "cell_type": "code", "execution_count": null, "id": "58ab718d", "metadata": {}, "outputs": [], "source": [ "pipe.to('cuda')\n", "\n", "# generate image\n", "generator = torch.manual_seed(0)\n", "new_image = pipe(\n", " text_prompt,\n", " num_inference_steps=20,\n", " generator=generator,\n", " image=image,\n", " control_image=depth_image,\n", " mask_image=mask_image\n", ").images[0]\n", "\n", "new_image.save('output/depth_result.png')" ] }, { "cell_type": "code", "execution_count": null, "id": "82ac435e", "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(12,4))\n", "\n", "plt.subplot(1,4,1)\n", "plt.imshow(image)\n", "plt.axis('off')\n", "plt.title('Input')\n", "plt.subplot(1,4,2)\n", "plt.imshow((255-np.array(image))*(255-np.array(mask_image)))\n", "plt.axis('off')\n", "plt.title('Masked')\n", "plt.subplot(1,4,3)\n", "plt.imshow(depth_image)\n", "plt.axis('off')\n", "plt.title('Condition')\n", "plt.subplot(1,4,4)\n", "plt.imshow(new_image)\n", "plt.title('Output')\n", "plt.axis('off')\n", "\n", "plt.savefig('output/depth_grid.png',\n", " dpi=200,\n", " bbox_inches='tight',\n", " pad_inches=0.0\n", " )" ] }, { "cell_type": "markdown", "id": "93db13cb", "metadata": {}, "source": [ "### Normal Map" ] }, { "cell_type": "code", "execution_count": null, "id": "08ffd6da", "metadata": {}, "outputs": [], "source": [ "import cv2\n", "\n", "depth_estimator = pipeline(\"depth-estimation\", model =\"Intel/dpt-hybrid-midas\" )\n", "\n", "normal_image = depth_estimator(image)['predicted_depth'][0]\n", "\n", "normal_image = normal_image.numpy()\n", "\n", "image_depth = normal_image.copy()\n", "image_depth -= np.min(image_depth)\n", "image_depth /= np.max(image_depth)\n", "\n", "bg_threhold = 0.4\n", "\n", "x = cv2.Sobel(normal_image, cv2.CV_32F, 1, 0, ksize=3)\n", "x[image_depth < bg_threhold] = 0\n", "\n", "y = cv2.Sobel(normal_image, cv2.CV_32F, 0, 1, ksize=3)\n", "y[image_depth < bg_threhold] = 0\n", "\n", "z = np.ones_like(x) * np.pi * 2.0\n", "\n", "normal_image = np.stack([x, y, z], axis=2)\n", "normal_image /= np.sum(normal_image ** 2.0, axis=2, keepdims=True) ** 0.5\n", "normal_image = (normal_image * 127.5 + 127.5).clip(0, 255).astype(np.uint8)\n", "normal_image = Image.fromarray(normal_image).resize((512,512))" ] }, { "cell_type": "code", "execution_count": null, "id": "c41bd52b", "metadata": {}, "outputs": [], "source": [ "controlnet = ControlNetModel.from_pretrained(\n", " \"fusing/stable-diffusion-v1-5-controlnet-normal\", torch_dtype=torch.float16\n", ")\n", "pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(\n", " \"runwayml/stable-diffusion-inpainting\", controlnet=controlnet, torch_dtype=torch.float16\n", " )\n", "\n", "pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)\n", "\n", "# Remove if you do not have xformers installed\n", "# see https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/xformers#installing-xformers\n", "# for installation instructions\n", "pipe.enable_xformers_memory_efficient_attention()" ] }, { "cell_type": "code", "execution_count": null, "id": "c8b5a39e", "metadata": {}, "outputs": [], "source": [ "pipe.to('cuda')\n", "\n", "# generate image\n", "generator = torch.manual_seed(0)\n", "new_image = pipe(\n", " text_prompt,\n", " num_inference_steps=20,\n", " generator=generator,\n", " image=image,\n", " control_image=normal_image,\n", " mask_image=mask_image\n", ").images[0]\n", "\n", "new_image.save('output/normal_result.png')" ] }, { "cell_type": "code", "execution_count": null, "id": "2737d23f", "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(12,4))\n", "\n", "plt.subplot(1,4,1)\n", "plt.imshow(image)\n", "plt.axis('off')\n", "plt.title('Input')\n", "plt.subplot(1,4,2)\n", "plt.imshow((255-np.array(image))*(255-np.array(mask_image)))\n", "plt.axis('off')\n", "plt.title('Masked')\n", "plt.subplot(1,4,3)\n", "plt.imshow(normal_image)\n", "plt.axis('off')\n", "plt.title('Condition')\n", "plt.subplot(1,4,4)\n", "plt.imshow(new_image)\n", "plt.title('Output')\n", "plt.axis('off')\n", "\n", "plt.savefig('output/normal_grid.png',\n", " dpi=200,\n", " bbox_inches='tight',\n", " pad_inches=0.0\n", " )" ] }, { "cell_type": "markdown", "id": "04683be6", "metadata": {}, "source": [ "### More control input types\n", "For these control input types, we will use a different image as in those cases, an image of the dog on the bench is not appropriate!\n", "\n", "Let's start with a room photo..." ] }, { "cell_type": "markdown", "id": "2d5c7d55", "metadata": {}, "source": [ "### M-LSD" ] }, { "cell_type": "code", "execution_count": null, "id": "9d2e3a7b", "metadata": {}, "outputs": [], "source": [ "from controlnet_aux import MLSDdetector\n", "\n", "mlsd = MLSDdetector.from_pretrained('lllyasviel/ControlNet')\n", "\n", "room_image = load_image(\"https://huggingface.co/lllyasviel/sd-controlnet-mlsd/resolve/main/images/room.png\")\n", "\n", "mlsd_image = mlsd(room_image).resize(room_image.size)\n", "#room_image = room_image" ] }, { "cell_type": "code", "execution_count": null, "id": "45629903", "metadata": {}, "outputs": [], "source": [ "room_mask=np.zeros_like(np.array(room_image))\n", "room_mask[120:420,220:,:]=255\n", "room_mask=Image.fromarray(room_mask)\n", "\n", "\n", "room_mask=room_mask.resize((512,512))\n", "mlsd_image=mlsd_image.resize((512,512))\n", "room_image=room_image.resize((512,512))" ] }, { "cell_type": "code", "execution_count": null, "id": "e491ab22", "metadata": {}, "outputs": [], "source": [ "controlnet = ControlNetModel.from_pretrained(\n", " \"fusing/stable-diffusion-v1-5-controlnet-mlsd\", torch_dtype=torch.float16\n", ")\n", "pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(\n", " \"runwayml/stable-diffusion-inpainting\", controlnet=controlnet, torch_dtype=torch.float16\n", " )\n", "\n", "pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)\n", "\n", "# Remove if you do not have xformers installed\n", "# see https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/xformers#installing-xformers\n", "# for installation instructions\n", "pipe.enable_xformers_memory_efficient_attention()" ] }, { "cell_type": "code", "execution_count": null, "id": "b414f354", "metadata": {}, "outputs": [], "source": [ "pipe.to('cuda')\n", "\n", "# generate image\n", "generator = torch.manual_seed(0)\n", "new_image = pipe(\n", " \"an image of a room with a city skyline view\",\n", " num_inference_steps=20,\n", " generator=generator,\n", " image=room_image,\n", " control_image=mlsd_image,\n", " mask_image=room_mask\n", ").images[0]\n", "\n", "new_image.save('output/mlsd_result.png')" ] }, { "cell_type": "code", "execution_count": null, "id": "326145e1", "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(12,4))\n", "\n", "plt.subplot(1,4,1)\n", "plt.imshow(room_image)\n", "plt.axis('off')\n", "plt.title('Input')\n", "plt.subplot(1,4,2)\n", "plt.imshow((255-np.array(room_image))*(255-np.array(room_mask)))\n", "plt.axis('off')\n", "plt.title('Masked')\n", "plt.subplot(1,4,3)\n", "plt.imshow(mlsd_image)\n", "plt.axis('off')\n", "plt.title('Condition')\n", "plt.subplot(1,4,4)\n", "plt.imshow(new_image)\n", "plt.title('Output')\n", "plt.axis('off')\n", "\n", "plt.savefig('output/mlsd_grid.png',\n", " dpi=200,\n", " bbox_inches='tight',\n", " pad_inches=0.0\n", " )" ] }, { "cell_type": "markdown", "id": "1f68f30b", "metadata": {}, "source": [ "### OpenPose" ] }, { "cell_type": "code", "execution_count": null, "id": "bbf9b00b", "metadata": {}, "outputs": [], "source": [ "controlnet = ControlNetModel.from_pretrained(\n", " \"fusing/stable-diffusion-v1-5-controlnet-openpose\", torch_dtype=torch.float16\n", ")\n", "pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(\n", " \"runwayml/stable-diffusion-inpainting\", controlnet=controlnet, torch_dtype=torch.float16\n", " )\n", "\n", "pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)\n", "\n", "# Remove if you do not have xformers installed\n", "# see https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/xformers#installing-xformers\n", "# for installation instructions\n", "pipe.enable_xformers_memory_efficient_attention()" ] }, { "cell_type": "code", "execution_count": null, "id": "e819d17c", "metadata": {}, "outputs": [], "source": [ "from controlnet_aux import OpenposeDetector\n", "\n", "openpose = OpenposeDetector.from_pretrained('lllyasviel/ControlNet')\n", "\n", "pose_real_image = load_image(\"https://huggingface.co/lllyasviel/sd-controlnet-openpose/resolve/main/images/pose.png\")\n", "\n", "pose_image = openpose(pose_real_image)\n", "pose_real_image=pose_real_image.resize(pose_image.size)\n", "\n", "pose_mask=np.zeros_like(np.array(pose_image))\n", "pose_mask[250:700,:,:]=255\n", "pose_mask=Image.fromarray(pose_mask)" ] }, { "cell_type": "code", "execution_count": null, "id": "2b6faf93", "metadata": {}, "outputs": [], "source": [ "pipe.to('cuda')\n", "\n", "# generate image\n", "generator = torch.manual_seed(0)\n", "new_image = pipe(\n", " \"a man in a knight armor\",\n", " num_inference_steps=20,\n", " generator=generator,\n", " image=pose_real_image,\n", " control_image=pose_image,\n", " mask_image=pose_mask\n", ").images[0]\n", "\n", "new_image.save('output/openpose_result.png')" ] }, { "cell_type": "code", "execution_count": null, "id": "a665a931", "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(12,4))\n", "\n", "plt.subplot(1,4,1)\n", "plt.imshow(pose_real_image)\n", "plt.axis('off')\n", "plt.title('Input')\n", "plt.subplot(1,4,2)\n", "plt.imshow((255-np.array(pose_real_image))*(255-np.array(pose_mask)))\n", "plt.axis('off')\n", "plt.title('Masked')\n", "plt.subplot(1,4,3)\n", "plt.imshow(pose_image)\n", "plt.axis('off')\n", "plt.title('Condition')\n", "plt.subplot(1,4,4)\n", "plt.imshow(new_image)\n", "plt.title('Output')\n", "plt.axis('off')\n", "\n", "\n", "plt.savefig('output/openpose_grid.png',\n", " dpi=200,\n", " bbox_inches='tight',\n", " pad_inches=0.0\n", " )" ] }, { "cell_type": "markdown", "id": "b982380d", "metadata": {}, "source": [ "### Segmentation Mask" ] }, { "cell_type": "code", "execution_count": null, "id": "f667b04a", "metadata": {}, "outputs": [], "source": [ "controlnet = ControlNetModel.from_pretrained(\n", " \"fusing/stable-diffusion-v1-5-controlnet-seg\", torch_dtype=torch.float16\n", ")\n", "pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(\n", " \"runwayml/stable-diffusion-inpainting\", controlnet=controlnet, torch_dtype=torch.float16\n", " )\n", "\n", "pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)\n", "\n", "# Remove if you do not have xformers installed\n", "# see https://huggingface.co/docs/diffusers/v0.13.0/en/optimization/xformers#installing-xformers\n", "# for installation instructions\n", "pipe.enable_xformers_memory_efficient_attention()" ] }, { "cell_type": "code", "execution_count": null, "id": "cb27c72b", "metadata": {}, "outputs": [], "source": [ "house_real_image=load_image(\"https://huggingface.co/lllyasviel/sd-controlnet-seg/resolve/main/images/house.png\")\n", "seg_image=load_image(\"https://huggingface.co/lllyasviel/sd-controlnet-seg/resolve/main/images/house_seg.png\")\n", "\n", "house_mask=np.zeros((*seg_image.size,3),dtype='uint8')\n", "house_mask[50:400,-350:,:]=255\n", "house_mask=Image.fromarray(house_mask)" ] }, { "cell_type": "code", "execution_count": null, "id": "1f81e50b", "metadata": {}, "outputs": [], "source": [ "pipe.to('cuda')\n", "\n", "# generate image\n", "generator = torch.manual_seed(0)\n", "new_image = pipe(\n", " \"a pink eerie scary house\",\n", " num_inference_steps=20,\n", " generator=generator,\n", " image=house_real_image,\n", " control_image=seg_image,\n", " mask_image=house_mask\n", ").images[0]\n", "\n", "new_image.save('output/seg_result.png')" ] }, { "cell_type": "code", "execution_count": null, "id": "37c0d695", "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(12,4))\n", "\n", "plt.subplot(1,4,1)\n", "plt.imshow(house_real_image)\n", "plt.axis('off')\n", "plt.title('Input')\n", "plt.subplot(1,4,2)\n", "plt.imshow((255-np.array(house_real_image))*(255-np.array(house_mask)))\n", "plt.axis('off')\n", "plt.title('Masked')\n", "plt.subplot(1,4,3)\n", "plt.imshow(seg_image)\n", "plt.axis('off')\n", "plt.title('Condition')\n", "plt.subplot(1,4,4)\n", "plt.imshow(new_image)\n", "plt.title('Output')\n", "plt.axis('off')\n", "\n", "plt.savefig('output/seg_grid.png',\n", " dpi=200,\n", " bbox_inches='tight',\n", " pad_inches=0.0\n", " )" ] }, { "cell_type": "code", "execution_count": null, "id": "7b8346f7", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "language_info": { "name": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 5 }