Improve model card: Add pipeline tag, library name, and usage example (#1)
Browse files- Improve model card: Add pipeline tag, library name, and usage example (1786423886dde4177edb87acb662b66a1eefaf93)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
|
@@ -1,7 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
datasets:
|
| 4 |
- michaelyuanqwq/roboseg
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
<h1> RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation </h1>
|
|
@@ -12,7 +14,59 @@ datasets:
|
|
| 12 |
|
| 13 |
The BG-Diffusion checkpoints of "RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation".
|
| 14 |
|
| 15 |
-
Please checkout https://github.com/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
### BibTex
|
| 18 |
```
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
datasets:
|
| 3 |
- michaelyuanqwq/roboseg
|
| 4 |
+
license: mit
|
| 5 |
+
pipeline_tag: image-to-image
|
| 6 |
+
library_name: diffusers
|
| 7 |
---
|
| 8 |
|
| 9 |
<h1> RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation </h1>
|
|
|
|
| 14 |
|
| 15 |
The BG-Diffusion checkpoints of "RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation".
|
| 16 |
|
| 17 |
+
Please checkout https://github.com/michaelyuanqwq/roboengine for more details.
|
| 18 |
+
|
| 19 |
+
## Usage
|
| 20 |
+
|
| 21 |
+
This model is a ControlNet model compatible with the `diffusers` library, specifically designed for background generation in robot scenes. It works by taking an image (e.g., a robot on a black background, created using a segmentation mask) as conditioning and generating a new background based on a text prompt.
|
| 22 |
+
|
| 23 |
+
```python
|
| 24 |
+
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline
|
| 25 |
+
from PIL import Image
|
| 26 |
+
import torch
|
| 27 |
+
|
| 28 |
+
# Load the ControlNet model
|
| 29 |
+
controlnet = ControlNetModel.from_pretrained("michaelyuanqwq/roboengine-bg-diffusion", torch_dtype=torch.float16)
|
| 30 |
+
|
| 31 |
+
# Load a base Stable Diffusion XL pipeline (this ControlNet is designed for SDXL)
|
| 32 |
+
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
|
| 33 |
+
"stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16
|
| 34 |
+
)
|
| 35 |
+
pipe.to("cuda")
|
| 36 |
+
|
| 37 |
+
# Prepare your input image and segmentation mask
|
| 38 |
+
# In a real application, these would come from your dataset or a segmentation model.
|
| 39 |
+
# The 'michaelyuanqwq/roboseg' dataset can provide examples.
|
| 40 |
+
# For demonstration: create a dummy input image and a white mask for the robot
|
| 41 |
+
input_image = Image.new("RGB", (768, 768), color = 'red') # Placeholder for your actual robot image
|
| 42 |
+
mask = Image.new("L", (768, 768), color = 'black') # Placeholder for your actual robot mask (white for robot, black for background)
|
| 43 |
+
from PIL import ImageDraw
|
| 44 |
+
draw = ImageDraw.Draw(mask)
|
| 45 |
+
draw.ellipse((100, 100, 668, 668), fill='white') # Draw a white circle as a dummy robot mask
|
| 46 |
+
|
| 47 |
+
# Create the conditioning image for ControlNet: robot on a black background
|
| 48 |
+
# This image tells ControlNet to preserve the white areas (robot) and generate new content for the black areas (background).
|
| 49 |
+
control_image = Image.composite(Image.new("RGB", input_image.size, (0, 0, 0)), input_image, mask.convert("1"))
|
| 50 |
+
|
| 51 |
+
# Define your text prompt for the new background
|
| 52 |
+
prompt = "A robot arm working in a futuristic lab with neon lights, high detail, photorealistic"
|
| 53 |
+
negative_prompt = "blurry, low quality, bad anatomy, deformed"
|
| 54 |
+
|
| 55 |
+
# Generate the image
|
| 56 |
+
generator = torch.Generator(device="cuda").manual_seed(42) # For reproducible results
|
| 57 |
+
output_image = pipe(
|
| 58 |
+
prompt=prompt,
|
| 59 |
+
image=control_image, # The conditioning image derived from the mask
|
| 60 |
+
negative_prompt=negative_prompt,
|
| 61 |
+
num_inference_steps=25,
|
| 62 |
+
generator=generator,
|
| 63 |
+
guidance_scale=7.5,
|
| 64 |
+
).images[0]
|
| 65 |
+
|
| 66 |
+
# Save the generated image
|
| 67 |
+
output_image.save("generated_robot_scene.png")
|
| 68 |
+
print("Generated image saved as generated_robot_scene.png")
|
| 69 |
+
```
|
| 70 |
|
| 71 |
### BibTex
|
| 72 |
```
|