roboengine-bg-diffusion / README.md

michaelyuanqwq

Improve model card: Add pipeline tag, library name, and usage example (#1)

366e777 verified about 15 hours ago

preview code

raw

history blame contribute delete

3.72 kB

metadata

datasets:
  - michaelyuanqwq/roboseg
license: mit
pipeline_tag: image-to-image
library_name: diffusers

RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation

Chengbo Yuan*, Suraj Joshi*, Shaoting Zhu*, Hang Su, Hang Zhao, Yang Gao.

[Project Website] [Arxiv] [BibTex]

The BG-Diffusion checkpoints of "RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation".

Please checkout https://github.com/michaelyuanqwq/roboengine for more details.

Usage

This model is a ControlNet model compatible with the diffusers library, specifically designed for background generation in robot scenes. It works by taking an image (e.g., a robot on a black background, created using a segmentation mask) as conditioning and generating a new background based on a text prompt.

from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline
from PIL import Image
import torch

# Load the ControlNet model
controlnet = ControlNetModel.from_pretrained("michaelyuanqwq/roboengine-bg-diffusion", torch_dtype=torch.float16)

# Load a base Stable Diffusion XL pipeline (this ControlNet is designed for SDXL)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.to("cuda")

# Prepare your input image and segmentation mask
# In a real application, these would come from your dataset or a segmentation model.
# The 'michaelyuanqwq/roboseg' dataset can provide examples.
# For demonstration: create a dummy input image and a white mask for the robot
input_image = Image.new("RGB", (768, 768), color = 'red') # Placeholder for your actual robot image
mask = Image.new("L", (768, 768), color = 'black') # Placeholder for your actual robot mask (white for robot, black for background)
from PIL import ImageDraw
draw = ImageDraw.Draw(mask)
draw.ellipse((100, 100, 668, 668), fill='white') # Draw a white circle as a dummy robot mask

# Create the conditioning image for ControlNet: robot on a black background
# This image tells ControlNet to preserve the white areas (robot) and generate new content for the black areas (background).
control_image = Image.composite(Image.new("RGB", input_image.size, (0, 0, 0)), input_image, mask.convert("1"))

# Define your text prompt for the new background
prompt = "A robot arm working in a futuristic lab with neon lights, high detail, photorealistic"
negative_prompt = "blurry, low quality, bad anatomy, deformed"

# Generate the image
generator = torch.Generator(device="cuda").manual_seed(42) # For reproducible results
output_image = pipe(
    prompt=prompt,
    image=control_image, # The conditioning image derived from the mask
    negative_prompt=negative_prompt,
    num_inference_steps=25,
    generator=generator,
    guidance_scale=7.5,
).images[0]

# Save the generated image
output_image.save("generated_robot_scene.png")
print("Generated image saved as generated_robot_scene.png")

BibTex

@article{yuan2025roboengine,
  title={RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation},
  author={Yuan, Chengbo and Joshi, Suraj and Zhu, Shaoting and Su, Hang and Zhao, Hang and Gao, Yang},
  journal={arXiv preprint arXiv:2503.18738},
  year={2025}
}