English
DeepakSridhar commited on
Commit
1362184
·
verified ·
1 Parent(s): f601485

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -3
README.md CHANGED
@@ -1,3 +1,115 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - CompVis/stable-diffusion-v1-4
7
+ ---
8
+ # FG-DM
9
+ [**Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis**](https://github.com/DeepakSridhar/fgdm)<br/>
10
+ [Deepak Sridhar](https://deepaksridhar.github.io/),
11
+ [Abhishek Peri](https://github.com/abhishek-peri),
12
+ [Rohith Rachala](https://github.com/rohithreddy0087)\,
13
+ [Nuno Vasconcelos](http://www.svcl.ucsd.edu/~nuno/)<br/>
14
+ _[NeurIPS '24](https://deepaksridhar.github.io/factorgraphdiffusion.github.io/static/images/FG_DM_NeurIPS_2024_final.pdf) |
15
+ [GitHub](https://github.com/DeepakSridhar/fgdm) | [arXiv](https://arxiv.org/abs/2410.21638) | [Project page](https://deepaksridhar.github.io/factorgraphdiffusion.github.io)_
16
+
17
+
18
+ ![fg-dm](data/arch.jpg)
19
+
20
+ ## Cloning
21
+ Use `--recursive` to also clone the segmentation editor app
22
+ ```
23
+ git clone --recursive https://github.com/DeepakSridhar/fgdm.git
24
+ ```
25
+
26
+ ## Requirements
27
+ A suitable [conda](https://conda.io/) environment named `ldm` can be created
28
+ and activated with:
29
+
30
+ ```
31
+ conda env create -f fgdm.yaml
32
+ conda activate ldm
33
+ ```
34
+
35
+ ### Dataset
36
+ We used COCO17 dataset for training FG-DMs.
37
+
38
+ 1. You can download the COCO 2017 dataset from the official [COCO Dataset Website](https://cocodataset.org/#download). Download the following components:
39
+ Annotations: Includes caption and instance annotations.
40
+ Images: Includes train2017, val2017, and test2017.
41
+ 2. Extract Files
42
+ Extract all downloaded files into the /data/coco directory or to your desired location.
43
+ Place the annotation files in the annotations/ folder.
44
+ Place the image folders in the images/ folder.
45
+ 3. Verify the Directory Structure
46
+ Ensure that your directory structure matches as outlined below.
47
+
48
+ coco/
49
+
50
+ |---- annotations/
51
+
52
+ |------- captions_train2017.json
53
+
54
+ |------- captions_val2017.json
55
+
56
+ |------- instances_train2017.json
57
+
58
+ |------- instances_val2017.json
59
+
60
+ |------- train2017/
61
+
62
+ |------- val2017/
63
+
64
+ |---- images/
65
+
66
+ |------- train2017/
67
+
68
+ |------- val2017/
69
+
70
+
71
+
72
+ ## FG-DM Pretrained Weights
73
+
74
+
75
+ The segmentation FGDM weights are available on [Google Drive](https://drive.google.com/drive/folders/1eIJxYE3eX5zReosGN1SQdnEDLatZuEp1?usp=sharing) Place them under models directory
76
+
77
+ ## Inference: Text-to-Image with FG-DM
78
+
79
+ ```
80
+ bash run_inference.sh
81
+ ```
82
+
83
+ ## Training: FG-DM Seg from scratch
84
+
85
+ - We used sdv1.4 weights for training FG-DM conditions but sdv1.5 is also compatible:
86
+
87
+ - The original SD weights are available via [the CompVis organization at Hugging Face](https://huggingface.co/CompVis). The license terms are identical to the original weights.
88
+
89
+ - `sd-v1-4.ckpt`: Resumed from `sd-v1-2.ckpt`. 225k steps at resolution `512x512` on "laion-aesthetics v2 5+" and 10\% dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598).
90
+
91
+ - Download the condition weights from [ControlNet](https://huggingface.co/lllyasviel/ControlNet/tree/main/annotator/ckpts) and place them in the models folder to train depth and normal FG-DMs.
92
+
93
+ - Alternatively download all these models by running [download_models.sh](scripts/download_models.sh) file under scripts directory.
94
+
95
+ ```
96
+ python main.py --base configs/stable-diffusion/nautilus_coco_adapter_semantic_map_gt_captions_distill_loss.yaml -t --gpus 0,
97
+ ```
98
+
99
+ ## Acknowledgements
100
+
101
+ Our codebase for the diffusion models builds heavily on [LDM codebase](https://github.com/CompVis/latent-diffusion) and [ControlNet](https://github.com/lllyasviel/ControlNet).
102
+
103
+ Thanks for open-sourcing!
104
+
105
+
106
+ ## BibTeX
107
+
108
+ ```
109
+ @inproceedings{neuripssridhar24,
110
+ author = {Sridhar, Deepak and Peri, Abhishek and Rachala, Rohit and Vasconcelos, Nuno},
111
+ title = {Adapting Diffusion Models for Improved Prompt Compliance and Controllable Image Synthesis},
112
+ booktitle = {Neural Information Processing Systems},
113
+ year = {2024},
114
+ }
115
+ ```