Update README.md
Browse files
README.md
CHANGED
|
@@ -4,19 +4,6 @@ license: apache-2.0
|
|
| 4 |
|
| 5 |
<h1 align="center">Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control</h1>
|
| 6 |
|
| 7 |
-
<div align='center'>
|
| 8 |
-
<a href="https://github.com/0xLDF" target="_blank">Danfeng Li</a><sup>1*</sup>,</span>
|
| 9 |
-
<a href="https://huizhang0812.github.io/" target="_blank">Hui Zhang</a><sup>1*</sup>,</span>
|
| 10 |
-
<a href="https://www.linkedin.com/in/sheng-wang-4620863a/" target="_blank">Sheng Wang</a><sup>2</sup>,
|
| 11 |
-
<a href="https://scholar.google.com/citations?user=qkaJhBMAAAAJ&hl=zh-CN" target="_blank">Jiacheng Li<a><sup>2</sup>,
|
| 12 |
-
<a href="https://zxwu.azurewebsites.net/" target="_blank">Zuxuan Wu</a><sup>1†</sup>
|
| 13 |
-
</div>
|
| 14 |
-
|
| 15 |
-
<div align='center'>
|
| 16 |
-
<br><sup>1</sup>Fudan University <sup>2</sup>HiThink Research
|
| 17 |
-
<br><small><sup>*</sup>Equal Contribution. <sup>†</sup>Corresponding author. </small>
|
| 18 |
-
</div>
|
| 19 |
-
<br>
|
| 20 |
|
| 21 |
<div align="center">
|
| 22 |
<!-- <a href='LICENSE'><img src='https://img.shields.io/badge/license-MIT-yellow'></a> -->
|
|
@@ -27,16 +14,11 @@ license: apache-2.0
|
|
| 27 |
<a href="https://huggingface.co/datasets/0xLDF/SACap-eval"><img src="https://img.shields.io/badge/🤗_HuggingFace-Benchmark-ffbd45.svg" alt="HuggingFace"></a>
|
| 28 |
|
| 29 |
</div>
|
| 30 |
-
<br>
|
| 31 |
|
| 32 |
-
|
| 33 |
-
<img src="assets/demo.png" width="90%" height="90%">
|
| 34 |
-
</p>
|
| 35 |
|
| 36 |
-
|
| 37 |
|
| 38 |
<p align="center">
|
| 39 |
-
<img src="assets/
|
| 40 |
</p>
|
| 41 |
-
|
| 42 |
-
(a) An overview of the Seg2Any framework. Seg2Any, which is built on the **FLUX.1-dev** foundation model, first converts segmentation masks into an Entity Contour Map and then encodes them into condition tokens via the frozen VAE. Negligible tokens are filtered out for efficiency. The resulting text, image, and condition tokens are concatenated into a unified sequence for MM-Attention. Our framework applies LoRA to all branches, achieving S2I generation with minimal extra parameters. (b) Attention Masks in MM-Attention, including Semantic Alignment Attention Mask and Attribute Isolation Attention Mask.
|
|
|
|
| 4 |
|
| 5 |
<h1 align="center">Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control</h1>
|
| 6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
|
| 8 |
<div align="center">
|
| 9 |
<!-- <a href='LICENSE'><img src='https://img.shields.io/badge/license-MIT-yellow'></a> -->
|
|
|
|
| 14 |
<a href="https://huggingface.co/datasets/0xLDF/SACap-eval"><img src="https://img.shields.io/badge/🤗_HuggingFace-Benchmark-ffbd45.svg" alt="HuggingFace"></a>
|
| 15 |
|
| 16 |
</div>
|
|
|
|
| 17 |
|
| 18 |
+
We release model weights trained on three distinct datasets: ADE20K, COCO-Stuff, and SACap-1M. The SACap-1M version is the most popular, offering fine-grained regional text prompts.
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
<br>
|
| 21 |
|
| 22 |
<p align="center">
|
| 23 |
+
<img src="assets/demo.png" width="90%" height="90%">
|
| 24 |
</p>
|
|
|
|
|
|