TencentARC
/

IC-Custom

@@ -1,17 +1,16 @@
 ---
 license: other
 license_name: community-license-agreement
 license_link: LICENSE
-language:
-- en
-base_model:
-- black-forest-labs/FLUX.1-Fill-dev
 pipeline_tag: image-to-image
 tags:
 - image customization
 ---
 <div align="center">
   <a href="https://github.com/TencentARC/IC-Custom">
     <img src='https://github.com/TencentARC/IC-Custom/blob/main/assets/IC-Custom-logo.png?raw=true' width='120px'>
@@ -42,6 +41,9 @@ tags:
   </a>
 </div>
 <p align="center">
   IC-Custom is designed for diverse image customization scenarios, including:
 </p>
@@ -52,7 +54,14 @@ tags:
 - **Position-free**: Input a reference image and a target description to generate a new image with the reference image's ID
   *Examples*: IP customization, character creation.
 ### Citation
-If you find IC-Custom useful, please consider giving it a ⭐ on [GitHub](https://github.com/TencentARC/IC-Custom).

 ---
+base_model:
+- black-forest-labs/FLUX.1-Fill-dev
+language:
+- en
 license: other
 license_name: community-license-agreement
 license_link: LICENSE
 pipeline_tag: image-to-image
 tags:
 - image customization
 ---
 <div align="center">
   <a href="https://github.com/TencentARC/IC-Custom">
     <img src='https://github.com/TencentARC/IC-Custom/blob/main/assets/IC-Custom-logo.png?raw=true' width='120px'>
   </a>
 </div>
+### Abstract
+Image customization, a crucial technique for industrial media production, aims to generate content that is consistent with reference images. However, current approaches conventionally separate image customization into position-aware and position-free customization paradigms and lack a universal framework for diverse customization, limiting their applications across various scenarios. To overcome these limitations, we propose IC-Custom, a unified framework that seamlessly integrates position-aware and position-free image customization through in-context learning. IC-Custom concatenates reference images with target images to a polyptych, leveraging DiT's multi-modal attention mechanism for fine-grained token-level interactions. We introduce the In-context Multi-Modal Attention (ICMA) mechanism with learnable task-oriented register tokens and boundary-aware positional embeddings to enable the model to correctly handle different task types and distinguish various inputs in polyptych configurations. To bridge the data gap, we carefully curated a high-quality dataset of 12k identity-consistent samples with 8k from real-world sources and 4k from high-quality synthetic data, avoiding the overly glossy and over-saturated synthetic appearance. IC-Custom supports various industrial applications, including try-on, accessory placement, furniture arrangement, and creative IP customization. Extensive evaluations on our proposed ProductBench and the publicly available DreamBench demonstrate that IC-Custom significantly outperforms community workflows, closed-source models, and state-of-the-art open-source approaches. IC-Custom achieves approximately 73% higher human preference across identity consistency, harmonicity, and text alignment metrics, while training only 0.4% of the original model parameters.
 <p align="center">
   IC-Custom is designed for diverse image customization scenarios, including:
 </p>
 - **Position-free**: Input a reference image and a target description to generate a new image with the reference image's ID
   *Examples*: IP customization, character creation.
 ### Citation
+```bibtex
+@article{li2025iccustom,
+  title={IC-Custom: Diverse Image Customization via In-Context Learning},
+  author={Li, Yaowei and Zhu, Yu and Wu, Xu and Liu, Bo and Li, Jia and Lu, Yong and Zhang, Song and Luo, Yujun},
+  journal={arXiv preprint arXiv:2507.01926},
+  year={2025},
+  url={https://arxiv.org/abs/2507.01926}
+}
+```