JCTN commited on
Commit
1263ddc
·
verified ·
1 Parent(s): 1f1433f

Upload 13 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ faceid_plusv2.jpg filter=lfs diff=lfs merge=lfs -text
37
+ ip-adapter-faceid.jpg filter=lfs diff=lfs merge=lfs -text
38
+ sdxl_faceid.jpg filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,324 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - text-to-image
4
+ - stable-diffusion
5
+
6
+ language:
7
+ - en
8
+ library_name: diffusers
9
+ ---
10
+
11
+ # IP-Adapter-FaceID Model Card
12
+
13
+
14
+ <div align="center">
15
+
16
+ [**Project Page**](https://ip-adapter.github.io) **|** [**Paper (ArXiv)**](https://arxiv.org/abs/2308.06721) **|** [**Code**](https://github.com/tencent-ailab/IP-Adapter)
17
+ </div>
18
+
19
+ ---
20
+
21
+
22
+
23
+ ## Introduction
24
+
25
+ An experimental version of IP-Adapter-FaceID: we use face ID embedding from a face recognition model instead of CLIP image embedding, additionally, we use LoRA to improve ID consistency. IP-Adapter-FaceID can generate various style images conditioned on a face with only text prompts.
26
+
27
+ ![results](./ip-adapter-faceid.jpg)
28
+
29
+
30
+ **Update 2023/12/27**:
31
+
32
+ IP-Adapter-FaceID-Plus: face ID embedding (for face ID) + CLIP image embedding (for face structure)
33
+
34
+ <div align="center">
35
+
36
+ ![results](./faceid-plus.jpg)
37
+ </div>
38
+
39
+ **Update 2023/12/28**:
40
+
41
+ IP-Adapter-FaceID-PlusV2: face ID embedding (for face ID) + controllable CLIP image embedding (for face structure)
42
+
43
+ You can adjust the weight of the face structure to get different generation!
44
+
45
+ <div align="center">
46
+
47
+ ![results](./faceid_plusv2.jpg)
48
+ </div>
49
+
50
+ **Update 2024/01/04**:
51
+
52
+ IP-Adapter-FaceID-SDXL: An experimental SDXL version of IP-Adapter-FaceID
53
+
54
+ <div align="center">
55
+
56
+ ![results](./sdxl_faceid.jpg)
57
+ </div>
58
+
59
+ ## Usage
60
+
61
+ ### IP-Adapter-FaceID
62
+
63
+ Firstly, you should use [insightface](https://github.com/deepinsight/insightface) to extract face ID embedding:
64
+
65
+ ```python
66
+
67
+ import cv2
68
+ from insightface.app import FaceAnalysis
69
+ import torch
70
+
71
+ app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
72
+ app.prepare(ctx_id=0, det_size=(640, 640))
73
+
74
+ image = cv2.imread("person.jpg")
75
+ faces = app.get(image)
76
+
77
+ faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
78
+ ```
79
+
80
+ Then, you can generate images conditioned on the face embeddings:
81
+
82
+ ```python
83
+
84
+ import torch
85
+ from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL
86
+ from PIL import Image
87
+
88
+ from ip_adapter.ip_adapter_faceid import IPAdapterFaceID
89
+
90
+ base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
91
+ vae_model_path = "stabilityai/sd-vae-ft-mse"
92
+ ip_ckpt = "ip-adapter-faceid_sd15.bin"
93
+ device = "cuda"
94
+
95
+ noise_scheduler = DDIMScheduler(
96
+ num_train_timesteps=1000,
97
+ beta_start=0.00085,
98
+ beta_end=0.012,
99
+ beta_schedule="scaled_linear",
100
+ clip_sample=False,
101
+ set_alpha_to_one=False,
102
+ steps_offset=1,
103
+ )
104
+ vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
105
+ pipe = StableDiffusionPipeline.from_pretrained(
106
+ base_model_path,
107
+ torch_dtype=torch.float16,
108
+ scheduler=noise_scheduler,
109
+ vae=vae,
110
+ feature_extractor=None,
111
+ safety_checker=None
112
+ )
113
+
114
+ # load ip-adapter
115
+ ip_model = IPAdapterFaceID(pipe, ip_ckpt, device)
116
+
117
+ # generate image
118
+ prompt = "photo of a woman in red dress in a garden"
119
+ negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality, blurry"
120
+
121
+ images = ip_model.generate(
122
+ prompt=prompt, negative_prompt=negative_prompt, faceid_embeds=faceid_embeds, num_samples=4, width=512, height=768, num_inference_steps=30, seed=2023
123
+ )
124
+
125
+ ```
126
+
127
+ you can also use a normal IP-Adapter and a normal LoRA to load model:
128
+
129
+ ```python
130
+ import torch
131
+ from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL
132
+ from PIL import Image
133
+
134
+ from ip_adapter.ip_adapter_faceid_separate import IPAdapterFaceID
135
+
136
+ base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
137
+ vae_model_path = "stabilityai/sd-vae-ft-mse"
138
+ ip_ckpt = "ip-adapter-faceid_sd15.bin"
139
+ lora_ckpt = "ip-adapter-faceid_sd15_lora.safetensors"
140
+ device = "cuda"
141
+
142
+ noise_scheduler = DDIMScheduler(
143
+ num_train_timesteps=1000,
144
+ beta_start=0.00085,
145
+ beta_end=0.012,
146
+ beta_schedule="scaled_linear",
147
+ clip_sample=False,
148
+ set_alpha_to_one=False,
149
+ steps_offset=1,
150
+ )
151
+ vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
152
+ pipe = StableDiffusionPipeline.from_pretrained(
153
+ base_model_path,
154
+ torch_dtype=torch.float16,
155
+ scheduler=noise_scheduler,
156
+ vae=vae,
157
+ feature_extractor=None,
158
+ safety_checker=None
159
+ )
160
+
161
+ # load lora and fuse
162
+ pipe.load_lora_weights(lora_ckpt)
163
+ pipe.fuse_lora()
164
+
165
+ # load ip-adapter
166
+ ip_model = IPAdapterFaceID(pipe, ip_ckpt, device)
167
+
168
+ # generate image
169
+ prompt = "photo of a woman in red dress in a garden"
170
+ negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality, blurry"
171
+
172
+ images = ip_model.generate(
173
+ prompt=prompt, negative_prompt=negative_prompt, faceid_embeds=faceid_embeds, num_samples=4, width=512, height=768, num_inference_steps=30, seed=2023
174
+ )
175
+
176
+
177
+ ```
178
+
179
+ ### IP-Adapter-FaceID-SDXL
180
+
181
+ Firstly, you should use [insightface](https://github.com/deepinsight/insightface) to extract face ID embedding:
182
+
183
+ ```python
184
+
185
+ import cv2
186
+ from insightface.app import FaceAnalysis
187
+ import torch
188
+
189
+ app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
190
+ app.prepare(ctx_id=0, det_size=(640, 640))
191
+
192
+ image = cv2.imread("person.jpg")
193
+ faces = app.get(image)
194
+
195
+ faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
196
+ ```
197
+
198
+ Then, you can generate images conditioned on the face embeddings:
199
+
200
+ ```python
201
+
202
+ import torch
203
+ from diffusers import StableDiffusionXLPipeline, DDIMScheduler
204
+ from PIL import Image
205
+
206
+ from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDXL
207
+
208
+ base_model_path = "SG161222/RealVisXL_V3.0"
209
+ ip_ckpt = "ip-adapter-faceid_sdxl.bin"
210
+ device = "cuda"
211
+
212
+ noise_scheduler = DDIMScheduler(
213
+ num_train_timesteps=1000,
214
+ beta_start=0.00085,
215
+ beta_end=0.012,
216
+ beta_schedule="scaled_linear",
217
+ clip_sample=False,
218
+ set_alpha_to_one=False,
219
+ steps_offset=1,
220
+ )
221
+ pipe = StableDiffusionXLPipeline.from_pretrained(
222
+ base_model_path,
223
+ torch_dtype=torch.float16,
224
+ scheduler=noise_scheduler,
225
+ add_watermarker=False,
226
+ )
227
+
228
+ # load ip-adapter
229
+ ip_model = IPAdapterFaceIDXL(pipe, ip_ckpt, device)
230
+
231
+ # generate image
232
+ prompt = "A closeup shot of a beautiful Asian teenage girl in a white dress wearing small silver earrings in the garden, under the soft morning light"
233
+ negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality, blurry"
234
+
235
+ images = ip_model.generate(
236
+ prompt=prompt, negative_prompt=negative_prompt, faceid_embeds=faceid_embeds, num_samples=2,
237
+ width=1024, height=1024,
238
+ num_inference_steps=30, guidance_scale=7.5, seed=2023
239
+ )
240
+
241
+ ```
242
+
243
+
244
+ ### IP-Adapter-FaceID-Plus
245
+
246
+ Firstly, you should use [insightface](https://github.com/deepinsight/insightface) to extract face ID embedding and face image:
247
+
248
+ ```python
249
+
250
+ import cv2
251
+ from insightface.app import FaceAnalysis
252
+ from insightface.utils import face_align
253
+ import torch
254
+
255
+ app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
256
+ app.prepare(ctx_id=0, det_size=(640, 640))
257
+
258
+ image = cv2.imread("person.jpg")
259
+ faces = app.get(image)
260
+
261
+ faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
262
+ face_image = face_align.norm_crop(image, landmark=faces[0].kps, image_size=224) # you can also segment the face
263
+ ```
264
+
265
+ Then, you can generate images conditioned on the face embeddings:
266
+
267
+ ```python
268
+
269
+ import torch
270
+ from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL
271
+ from PIL import Image
272
+
273
+ from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDPlus
274
+
275
+ v2 = False
276
+ base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
277
+ vae_model_path = "stabilityai/sd-vae-ft-mse"
278
+ image_encoder_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
279
+ ip_ckpt = "ip-adapter-faceid-plus_sd15.bin" if not v2 else "ip-adapter-faceid-plusv2_sd15.bin"
280
+ device = "cuda"
281
+
282
+ noise_scheduler = DDIMScheduler(
283
+ num_train_timesteps=1000,
284
+ beta_start=0.00085,
285
+ beta_end=0.012,
286
+ beta_schedule="scaled_linear",
287
+ clip_sample=False,
288
+ set_alpha_to_one=False,
289
+ steps_offset=1,
290
+ )
291
+ vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
292
+ pipe = StableDiffusionPipeline.from_pretrained(
293
+ base_model_path,
294
+ torch_dtype=torch.float16,
295
+ scheduler=noise_scheduler,
296
+ vae=vae,
297
+ feature_extractor=None,
298
+ safety_checker=None
299
+ )
300
+
301
+ # load ip-adapter
302
+ ip_model = IPAdapterFaceIDPlus(pipe, image_encoder_path, ip_ckpt, device)
303
+
304
+ # generate image
305
+ prompt = "photo of a woman in red dress in a garden"
306
+ negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality, blurry"
307
+
308
+ images = ip_model.generate(
309
+ prompt=prompt, negative_prompt=negative_prompt, face_image=face_image, faceid_embeds=faceid_embeds, shortcut=v2, s_scale=1.0,
310
+ num_samples=4, width=512, height=768, num_inference_steps=30, seed=2023
311
+ )
312
+
313
+ ```
314
+
315
+
316
+ ## Limitations and Bias
317
+ - The model does not achieve perfect photorealism and ID consistency.
318
+ - The generalization of the model is limited due to limitations of the training data, base model and face recognition model.
319
+
320
+
321
+
322
+ ## Non-commercial use
323
+ **This model is released exclusively for research purposes and is not intended for commercial use.**
324
+
faceid-plus.jpg ADDED
faceid_plusv2.jpg ADDED

Git LFS Details

  • SHA256: 5d369c3e49defca663dc50b28b1bb621834d319500b28de6a8de6a6eb319a2de
  • Pointer size: 132 Bytes
  • Size of remote file: 3.44 MB
ip-adapter-faceid-plus_sd15.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:252fb53e0d018489d9e7f9b9e2001a52ff700e491894011ada7cfb471e0fadf2
3
+ size 156558503
ip-adapter-faceid-plus_sd15_lora.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f00341d11e5e7b5aadf63cbdead09ef82eb28669156161cf1bfc2105d4ff1cd
3
+ size 51059544
ip-adapter-faceid-plusv2_sd15.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26d0d86a1d60d6cc811d3b8862178b461e1eeb651e6fe2b72ba17aa95411e313
3
+ size 156558509
ip-adapter-faceid-plusv2_sd15_lora.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8abff87a15a049f3e0186c2e82c1c8e77783baf2cfb63f34c412656052eb57b0
3
+ size 51059544
ip-adapter-faceid.jpg ADDED

Git LFS Details

  • SHA256: cd4aa6c124459cfe5a9307f298fcf7580fd6caca5ee82baa3b68ca3beb346847
  • Pointer size: 132 Bytes
  • Size of remote file: 4.7 MB
ip-adapter-faceid_sd15.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:201344e22e6f55849cf07ca7a6e53d8c3b001327c66cb9710d69fd5da48a8da7
3
+ size 96740574
ip-adapter-faceid_sd15_lora.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70699f0dbfadd47de1f81d263cf4c86bd4b7271d841304af9b340b3a7f38e86a
3
+ size 51059544
ip-adapter-faceid_sdxl.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f455fed24e207c878ec1e0466b34a969d37bab857c5faa4e8d259a0b4ff63d7e
3
+ size 1071149741
ip-adapter-faceid_sdxl_lora.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4fcf93d6e8dc8dd18f5f9e51c8306f369486ed0aa0780ade9961308aff7f0d64
3
+ size 371842896
sdxl_faceid.jpg ADDED

Git LFS Details

  • SHA256: 598c4982ab85aa9b38fbb8cff218ad096ffc6bf6bf01cb192f86be9751c81eef
  • Pointer size: 132 Bytes
  • Size of remote file: 2.78 MB