Papers
arxiv:2508.21066

OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

Published on Aug 28
· Submitted by XionghuiWang on Aug 29
Authors:
,
Jie Wu ,
,
,

Abstract

A unified reinforcement learning framework using a single vision-language model enhances generative capabilities across multiple tasks without task-specific fine-tuning.

AI-generated summary

In this paper, we introduce OneReward, a unified reinforcement learning framework that enhances the model's generative capabilities across multiple tasks under different evaluation criteria using only One Reward model. By employing a single vision-language model (VLM) as the generative reward model, which can distinguish the winner and loser for a given task and a given evaluation criterion, it can be effectively applied to multi-task generation models, particularly in contexts with varied data and diverse task objectives. We utilize OneReward for mask-guided image generation, which can be further divided into several sub-tasks such as image fill, image extend, object removal, and text rendering, involving a binary mask as the edit area. Although these domain-specific tasks share same conditioning paradigm, they differ significantly in underlying data distributions and evaluation metrics. Existing methods often rely on task-specific supervised fine-tuning (SFT), which limits generalization and training efficiency. Building on OneReward, we develop Seedream 3.0 Fill, a mask-guided generation model trained via multi-task reinforcement learning directly on a pre-trained base model, eliminating the need for task-specific SFT. Experimental results demonstrate that our unified edit model consistently outperforms both commercial and open-source competitors, such as Ideogram, Adobe Photoshop, and FLUX Fill [Pro], across multiple evaluation dimensions. Code and model are available at: https://one-reward.github.io

Community

Paper author Paper submitter
This comment has been hidden (marked as Resolved)
Paper author Paper submitter
This comment has been hidden (marked as Graphic Content)
Paper author Paper submitter

Project URL: https://one-reward.github.io/
Contain RL methodology and Seedream 3.0 Fill technical report

image.png

image.png

image.png

This comment has been hidden (marked as Off-Topic)

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.21066 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.21066 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.21066 in a Space README.md to link it from this page.

Collections including this paper 2