WeiChow
/

AnySD

Diffusers

Safetensors

Model card Files Files and versions Community

WeiChow commited on 20 days ago

Commit

e159d73

verified ·

1 Parent(s): 3615e08

Upload README.md

Browse files

Files changed (1) hide show

README.md +109 -1

README.md CHANGED Viewed

@@ -2,4 +2,112 @@
 license: apache-2.0
 ---
-Coming soon!

 license: apache-2.0
 ---
+## 🎨 AnySD
+[![arXiv](https://img.shields.io/badge/arXiv-2411.15738-b31b1b.svg)](https://arxiv.org/abs/2411.15738)
+[![Dataset](https://img.shields.io/badge/🤗%20Huggingface-Dataset-yellow)](https://huggingface.co/datasets/Bin1117/AnyEdit)
+[![Checkpoint](https://img.shields.io/badge/🤗%20Huggingface-CKPT-blue)](https://huggingface.co/WeiChow/AnySD)
+[![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github)](https://github.com/DCDmllm/AnyEdit)
+[![Page](https://img.shields.io/badge/Home-Page-b3.svg)](https://dcd-anyedit.github.io/)
+This is the official model of  **AnyEdit: Unified High-Quality Image Edit with Any Idea**
+#### 🔎 Summary
+Since  **AnyEdit** contains a wide range of editing instructions across various domains, it holds promising potential for developing a powerful editing model to address high-quality editing tasks. However, training such a model has three extra challenges: (a) aligning the semantics of various multi-modal inputs; (b) identifying the semantic edits within each domain to control the granularity and scope of the edits; (c) coordinating the complexity of various editing tasks to prevent catastrophic forgetting. To this end, we propose a novel **AnyEdit Stable Diffusion** approach (🎨**AnySD**) to cope with various editing tasks in the real world.
+<img src="E:\gitproject\AnySD\assets\model.png" width='100%' />
+**Architecture of 🎨AnySD**. 🎨**AnySD** is a novel architecture that supports three conditions (original image, editing instruction, visual prompt) for various editing tasks.
+💖 Our model is based on the awesome **[SD 1.5 ](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5)**
+#### 🌐 Inference
+To run the model, you can refer to the code in [![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github)](https://github.com/DCDmllm/AnyEdit), specifically
+```shell
+CUDA_VISIBLE_DEVICES=0 PYTHONPATH='./' python3 anysd/infer.py
+```
+The script content is:
+```python
+import os
+from tqdm import tqdm
+from anysd.src.model import AnySDPipeline, choose_expert
+from anysd.train.valid_log import download_image
+from anysd.src.utils import choose_book, get_experts_dir
+if __name__ == "__main__":
+    expert_file_path = get_experts_dir(repo_id="WeiChow/AnySD")
+    book_dim, book = choose_book('all')
+    task_embs_checkpoints = expert_file_path + "task_embs.bin"
+    adapter_checkpoints = {
+        "global": expert_file_path + "global.bin",
+        "viewpoint": expert_file_path + "viewpoint.bin",
+        "visual_bbox": expert_file_path + "visual_bbox.bin",
+        "visual_depth": expert_file_path + "visual_dep.bin",
+        "visual_material_transfer": expert_file_path + "visual_mat.bin",
+        "visual_reference": expert_file_path + "visual_ref.bin",
+        "visual_scribble": expert_file_path + "visual_scr.bin",
+        "visual_segment": expert_file_path + "visual_seg.bin",
+        "visual_sketch": expert_file_path + "visual_ske.bin",
+    }
+    pipeline = AnySDPipeline(adapters_list=adapter_checkpoints, task_embs_checkpoints=task_embs_checkpoints)
+    os.makedirs('./assets/anysd-test/', exist_ok=True)
+    case = [
+        {
+            "edit": "Put on a pair of sunglasses",
+            "edit_type": 'general',
+            "image_file": "./assets/woman.jpg"
+        },
+        {
+            "edit": "Make her a wizard",
+            "edit_type": 'general',
+            "image_file": "./assets/woman.jpg"
+        }
+    ]
+    for index, item in enumerate(tqdm(case)):
+        mode = choose_expert(mode=item["edit_type"])
+        if mode == 'general':
+            images = pipeline(
+                prompt=item['edit'],
+                original_image=download_image(item['image_file']),
+                guidance_scale=3,
+                num_inference_steps=100,
+                original_image_guidance_scale=3,
+                adapter_name="general",
+            )[0]
+        else:
+            images = pipeline(
+                prompt=item['edit'],
+                reference_image=download_image(item['refence_image_file']) if ('refence_image_file' in item.keys() and item['refence_image_file'] is not None)  else None,
+                original_image=download_image(item['image_file']),
+                guidance_scale=1.5,
+                num_inference_steps=100,
+                original_image_guidance_scale=2,
+                reference_image_guidance_scale=0.8,
+                adapter_name=mode,
+                e_code=book[item["edit_type"]],
+            )[0]
+        images.save(f"./assets/anysd-test/{index}.jpg")
+```
+We sorted out the AnyEdit data when we released it to the public. To adapt the sorted model, we retrained the model, so the results will be slightly different from those in the paper, but the general results are similar. And the hyperparameters also have a greater impact on the results.
+## 📚 Citation
+```shell
+@article{yu2024anyedit,
+  title={AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea},
+  author={Yu, Qifan and Chow, Wei and Yue, Zhongqi and Pan, Kaihang and Wu, Yang and Wan, Xiaoyang and Li, Juncheng and Tang, Siliang and Zhang, Hanwang and Zhuang, Yueting},
+  journal={arXiv preprint arXiv:2411.15738},
+  year={2024}
+}
+```