metadata
datasets:
- michaelyuanqwq/roboseg
license: mit
pipeline_tag: image-to-image
library_name: diffusers
RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation
Chengbo Yuan*, Suraj Joshi*, Shaoting Zhu*, Hang Su, Hang Zhao, Yang Gao.
[Project Website] [Arxiv] [BibTex]
The BG-Diffusion checkpoints of "RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation".
Please checkout https://github.com/michaelyuanqwq/roboengine for more details.
Usage
This model is a ControlNet model compatible with the diffusers
library, specifically designed for background generation in robot scenes. It works by taking an image (e.g., a robot on a black background, created using a segmentation mask) as conditioning and generating a new background based on a text prompt.
from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline
from PIL import Image
import torch
# Load the ControlNet model
controlnet = ControlNetModel.from_pretrained("michaelyuanqwq/roboengine-bg-diffusion", torch_dtype=torch.float16)
# Load a base Stable Diffusion XL pipeline (this ControlNet is designed for SDXL)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.to("cuda")
# Prepare your input image and segmentation mask
# In a real application, these would come from your dataset or a segmentation model.
# The 'michaelyuanqwq/roboseg' dataset can provide examples.
# For demonstration: create a dummy input image and a white mask for the robot
input_image = Image.new("RGB", (768, 768), color = 'red') # Placeholder for your actual robot image
mask = Image.new("L", (768, 768), color = 'black') # Placeholder for your actual robot mask (white for robot, black for background)
from PIL import ImageDraw
draw = ImageDraw.Draw(mask)
draw.ellipse((100, 100, 668, 668), fill='white') # Draw a white circle as a dummy robot mask
# Create the conditioning image for ControlNet: robot on a black background
# This image tells ControlNet to preserve the white areas (robot) and generate new content for the black areas (background).
control_image = Image.composite(Image.new("RGB", input_image.size, (0, 0, 0)), input_image, mask.convert("1"))
# Define your text prompt for the new background
prompt = "A robot arm working in a futuristic lab with neon lights, high detail, photorealistic"
negative_prompt = "blurry, low quality, bad anatomy, deformed"
# Generate the image
generator = torch.Generator(device="cuda").manual_seed(42) # For reproducible results
output_image = pipe(
prompt=prompt,
image=control_image, # The conditioning image derived from the mask
negative_prompt=negative_prompt,
num_inference_steps=25,
generator=generator,
guidance_scale=7.5,
).images[0]
# Save the generated image
output_image.save("generated_robot_scene.png")
print("Generated image saved as generated_robot_scene.png")
BibTex
@article{yuan2025roboengine,
title={RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation},
author={Yuan, Chengbo and Joshi, Suraj and Zhu, Shaoting and Su, Hang and Zhao, Hang and Gao, Yang},
journal={arXiv preprint arXiv:2503.18738},
year={2025}
}