Diffusers
Safetensors
WeiChow commited on
Commit
e159d73
Β·
verified Β·
1 Parent(s): 3615e08

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -1
README.md CHANGED
@@ -2,4 +2,112 @@
2
  license: apache-2.0
3
  ---
4
 
5
- Coming soon!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
4
 
5
+ ## 🎨 AnySD
6
+
7
+ [![arXiv](https://img.shields.io/badge/arXiv-2411.15738-b31b1b.svg)](https://arxiv.org/abs/2411.15738)
8
+ [![Dataset](https://img.shields.io/badge/πŸ€—%20Huggingface-Dataset-yellow)](https://huggingface.co/datasets/Bin1117/AnyEdit)
9
+ [![Checkpoint](https://img.shields.io/badge/πŸ€—%20Huggingface-CKPT-blue)](https://huggingface.co/WeiChow/AnySD)
10
+ [![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github)](https://github.com/DCDmllm/AnyEdit)
11
+ [![Page](https://img.shields.io/badge/Home-Page-b3.svg)](https://dcd-anyedit.github.io/)
12
+
13
+
14
+ This is the official model of **AnyEdit: Unified High-Quality Image Edit with Any Idea**
15
+
16
+ #### πŸ”Ž Summary
17
+
18
+ Since **AnyEdit** contains a wide range of editing instructions across various domains, it holds promising potential for developing a powerful editing model to address high-quality editing tasks. However, training such a model has three extra challenges: (a) aligning the semantics of various multi-modal inputs; (b) identifying the semantic edits within each domain to control the granularity and scope of the edits; (c) coordinating the complexity of various editing tasks to prevent catastrophic forgetting. To this end, we propose a novel **AnyEdit Stable Diffusion** approach (🎨**AnySD**) to cope with various editing tasks in the real world.
19
+
20
+ <img src="E:\gitproject\AnySD\assets\model.png" width='100%' />
21
+
22
+ **Architecture of 🎨AnySD**. 🎨**AnySD** is a novel architecture that supports three conditions (original image, editing instruction, visual prompt) for various editing tasks.
23
+
24
+ πŸ’– Our model is based on the awesome **[SD 1.5 ](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5)**
25
+
26
+ #### 🌐 Inference
27
+
28
+ To run the model, you can refer to the code in [![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github)](https://github.com/DCDmllm/AnyEdit), specifically
29
+
30
+ ```shell
31
+ CUDA_VISIBLE_DEVICES=0 PYTHONPATH='./' python3 anysd/infer.py
32
+ ```
33
+
34
+ The script content is:
35
+
36
+ ```python
37
+ import os
38
+ from tqdm import tqdm
39
+ from anysd.src.model import AnySDPipeline, choose_expert
40
+ from anysd.train.valid_log import download_image
41
+ from anysd.src.utils import choose_book, get_experts_dir
42
+
43
+ if __name__ == "__main__":
44
+ expert_file_path = get_experts_dir(repo_id="WeiChow/AnySD")
45
+ book_dim, book = choose_book('all')
46
+ task_embs_checkpoints = expert_file_path + "task_embs.bin"
47
+ adapter_checkpoints = {
48
+ "global": expert_file_path + "global.bin",
49
+ "viewpoint": expert_file_path + "viewpoint.bin",
50
+ "visual_bbox": expert_file_path + "visual_bbox.bin",
51
+ "visual_depth": expert_file_path + "visual_dep.bin",
52
+ "visual_material_transfer": expert_file_path + "visual_mat.bin",
53
+ "visual_reference": expert_file_path + "visual_ref.bin",
54
+ "visual_scribble": expert_file_path + "visual_scr.bin",
55
+ "visual_segment": expert_file_path + "visual_seg.bin",
56
+ "visual_sketch": expert_file_path + "visual_ske.bin",
57
+ }
58
+
59
+ pipeline = AnySDPipeline(adapters_list=adapter_checkpoints, task_embs_checkpoints=task_embs_checkpoints)
60
+
61
+ os.makedirs('./assets/anysd-test/', exist_ok=True)
62
+ case = [
63
+ {
64
+ "edit": "Put on a pair of sunglasses",
65
+ "edit_type": 'general',
66
+ "image_file": "./assets/woman.jpg"
67
+ },
68
+ {
69
+ "edit": "Make her a wizard",
70
+ "edit_type": 'general',
71
+ "image_file": "./assets/woman.jpg"
72
+ }
73
+ ]
74
+
75
+ for index, item in enumerate(tqdm(case)):
76
+ mode = choose_expert(mode=item["edit_type"])
77
+ if mode == 'general':
78
+ images = pipeline(
79
+ prompt=item['edit'],
80
+ original_image=download_image(item['image_file']),
81
+ guidance_scale=3,
82
+ num_inference_steps=100,
83
+ original_image_guidance_scale=3,
84
+ adapter_name="general",
85
+ )[0]
86
+ else:
87
+ images = pipeline(
88
+ prompt=item['edit'],
89
+ reference_image=download_image(item['refence_image_file']) if ('refence_image_file' in item.keys() and item['refence_image_file'] is not None) else None,
90
+ original_image=download_image(item['image_file']),
91
+ guidance_scale=1.5,
92
+ num_inference_steps=100,
93
+ original_image_guidance_scale=2,
94
+ reference_image_guidance_scale=0.8,
95
+ adapter_name=mode,
96
+ e_code=book[item["edit_type"]],
97
+ )[0]
98
+
99
+ images.save(f"./assets/anysd-test/{index}.jpg")
100
+ ```
101
+
102
+ We sorted out the AnyEdit data when we released it to the public. To adapt the sorted model, we retrained the model, so the results will be slightly different from those in the paper, but the general results are similar. And the hyperparameters also have a greater impact on the results.
103
+
104
+ ## πŸ“š Citation
105
+
106
+ ```shell
107
+ @article{yu2024anyedit,
108
+ title={AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea},
109
+ author={Yu, Qifan and Chow, Wei and Yue, Zhongqi and Pan, Kaihang and Wu, Yang and Wan, Xiaoyang and Li, Juncheng and Tang, Siliang and Zhang, Hanwang and Zhuang, Yueting},
110
+ journal={arXiv preprint arXiv:2411.15738},
111
+ year={2024}
112
+ }
113
+ ```