Spaces:
Running
on
Zero
Running
on
Zero
File size: 7,010 Bytes
691af46 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
# Text-Guided-Image-Colorization
This project utilizes the power of **Stable Diffusion (SDXL/SDXL-Light)** and the **BLIP (Bootstrapping Language-Image Pre-training)** captioning model to provide an interactive image colorization experience. Users can influence the generated colors of objects within images, making the colorization process more personalized and creative.
## Table of Contents
- [Features](#features)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Dataset Usage](#dataset-usage)
- [Training](#training)
- [Evaluation](#evaluation)
- [Results](#results)
- [License](#license)
## Features
- **Interactive Colorization**: Users can specify desired colors for different objects in the image.
- **ControlNet Approach**: Enhanced colorization capabilities through retraining with ControlNet, allowing SDXL to better adapt to the image colorization task.
- **High-Quality Outputs**: Leverage the latest advancements in diffusion models to generate vibrant and realistic colorizations.
- **User-Friendly Interface**: Easy-to-use interface for seamless interaction with the model.
## Installation
To set up the project locally, follow these steps:
1. **Clone the Repository**:
```bash
git clone https://github.com/nick8592/text-guided-image-colorization.git
cd text-guided-image-colorization
```
2. **Install Dependencies**:
Make sure you have Python 3.7 or higher installed. Then, install the required packages:
```bash
pip install -r requirements.txt
```
Install `torch` and `torchvision` matching your CUDA version:
```bash
pip install torch torchvision --index-url https://download.pytorch.org/whl/cuXXX
```
Replace `XXX` with your CUDA version (e.g., `118` for CUDA 11.8). For more info, see [PyTorch Get Started](https://pytorch.org/get-started/locally/).
3. **Download Pre-trained Models**:
| Models | Hugging Face (Recommand) | Other |
|:---:|:---:|:---:|
|SDXL-Lightning Caption|[link](https://huggingface.co/nickpai/sdxl_light_caption_output)|[link](https://gofile.me/7uE8s/FlEhfpWPw) (2kNJfV)|
|SDXL-Lightning Custom Caption (Recommand)|[link](https://huggingface.co/nickpai/sdxl_light_custom_caption_output)|[link](https://gofile.me/7uE8s/AKmRq5sLR) (KW7Fpi)|
```bash
text-guided-image-colorization/sdxl_light_caption_output
βββ checkpoint-30000
βββ controlnet
β βββ diffusion_pytorch_model.safetensors
β βββ config.json
βββ optimizer.bin
βββ random_states_0.pkl
βββ scaler.pt
βββ scheduler.bin
```
## Quick Start
1. Run the `gradio_ui.py` script:
```bash
python gradio_ui.py
```
2. Open the provided URL in your web browser to access the Gradio-based user interface.
3. Upload an image and use the interface to control the colors of specific objects in the image. But still the model can generate images without a specific prompt.
4. The model will generate a colorized version of the image based on your input (or automatic). See the [demo video](https://x.com/weichenpai/status/1829513077588631987).
data:image/s3,"s3://crabby-images/3d0c6/3d0c6e8309134cfe05a476fdafe51a86ef613d57" alt="Gradio UI"
## Dataset Usage
You can find more details about the dataset usage in the [Dataset-for-Image-Colorization](https://github.com/nick8592/Dataset-for-Image-Colorization).
## Training
For training, you can use one of the following scripts:
- `train_controlnet.sh`: Trains a model using [Stable Diffusion v2](https://huggingface.co/stabilityai/stable-diffusion-2-1)
- `train_controlnet_sdxl.sh`: Trains a model using [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
- `train_controlnet_sdxl_light.sh`: Trains a model using [SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning)
Although the training code for SDXL is provided, due to a lack of GPU resources, I wasn't able to train the model by myself. Therefore, there might be some errors when you try to train the model.
## Evaluation
For evaluation, you can use one of the following scripts:
- `eval_controlnet.sh`: Evaluates the model using [Stable Diffusion v2](https://huggingface.co/stabilityai/stable-diffusion-2-1) for a folder of images.
- `eval_controlnet_sdxl_light.sh`: Evaluates the model using [SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning) for a folder of images.
- `eval_controlnet_sdxl_light_single.sh`: Evaluates the model using [SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning) for a single image.
## Results
### Prompt-Guided
| Caption | Condition 1 | Condition 2 | Condition 3 |
|:---:|:---:|:---:|:---:|
| data:image/s3,"s3://crabby-images/91db1/91db15327f5cd97a18a27c58b03fd8dda2cf18c1" alt="000000022935_gray.jpg" | data:image/s3,"s3://crabby-images/b21ed/b21ed74a84147ba626b5c3bff7036c69f52b66a9" alt="000000022935_green_shirt_on_right_girl.jpeg" | data:image/s3,"s3://crabby-images/884f2/884f21282a3b8e251901e29f08f921dccb5ffee5" alt="000000022935_purple_shirt_on_right_girl.jpeg" |data:image/s3,"s3://crabby-images/df97c/df97c1ee11a96c160b9c1b99e2a79dd2acfc9b6b" alt="000000022935_red_shirt_on_right_girl.jpeg" |
| a photography of a woman in a soccer uniform kicking a soccer ball | + "green shirt"| + "purple shirt" | + "red shirt" |
| data:image/s3,"s3://crabby-images/66fbe/66fbe867d365f74c7670c2429bd784c8b52f7dcc" alt="000000041633_gray.jpg" | data:image/s3,"s3://crabby-images/2939a/2939ae70f3bf3bce39aa626fb4ac105013c7466e" alt="000000041633_bright_red_car.jpeg" | data:image/s3,"s3://crabby-images/af4f2/af4f2ff9ebb9d01e8856e3e993aa06ab46e27225" alt="000000041633_dark_blue_car.jpeg" |data:image/s3,"s3://crabby-images/c5a8b/c5a8b52881cdc75ced68a5f74b58c661394b76ce" alt="000000041633_black_car.jpeg" |
| a photography of a photo of a truck | + "bright red car"| + "dark blue car" | + "black car" |
| data:image/s3,"s3://crabby-images/e6d5b/e6d5bfc3304968fafc93775a540f8a8bc68802ca" alt="000000286708_gray.jpg" | data:image/s3,"s3://crabby-images/b7eeb/b7eebe3c662774b64f34fed81aadb756a31b1fbf" alt="000000286708_orange_hat.jpeg" | data:image/s3,"s3://crabby-images/2e962/2e9621c7134b74c4ff3e8baba9fda00aa136665a" alt="000000286708_pink_hat.jpeg" |data:image/s3,"s3://crabby-images/740c8/740c8d69dc49a62e3c53f25ada8d59698fb9d623" alt="000000286708_yellow_hat.jpeg" |
| a photography of a cat wearing a hat on his head | + "orange hat"| + "pink hat" | + "yellow hat" |
### Prompt-Free
Ground truth images are provided solely for reference purpose in the image colorization task.
| Grayscale Image | Colorized Result | Ground Truth |
|:---:|:---:|:---:|
| data:image/s3,"s3://crabby-images/edfb2/edfb2a025fa1e1c87a0a73c3d1a6bda136b1184b" alt="000000025560_gray.jpg" | data:image/s3,"s3://crabby-images/a68c1/a68c1fb3f6eb9e919bbed28a012f24ba89c5b5ff" alt="000000025560_color.jpg" | data:image/s3,"s3://crabby-images/c696c/c696c8864c6f1d4811db1beaf0b7a9cd0f377c3a" alt="000000025560_gt.jpg" |
| data:image/s3,"s3://crabby-images/78b0a/78b0add14ab4849e987b2c486962b1f4110bc77b" alt="000000065736_gray.jpg" | data:image/s3,"s3://crabby-images/f7549/f75494008affe69d54b1eb67775ffbb8bd113dad" alt="000000065736_color.jpg" | data:image/s3,"s3://crabby-images/b9884/b98843967f03ac3b1f313c4b3baffe9bfdf60c1b" alt="000000065736_gt.jpg" |
| data:image/s3,"s3://crabby-images/85394/853941b8e4618c3b7b238747cdca3fded3cc7cd3" alt="000000091779_gray.jpg" | data:image/s3,"s3://crabby-images/2b5e8/2b5e8c1b2b3c5e630b630fb32116ceeeda1ab3a9" alt="000000091779_color.jpg" | data:image/s3,"s3://crabby-images/0c761/0c7616969096cb55551e551d17dfe3c90ab144de" alt="000000091779_gt.jpg" |
| data:image/s3,"s3://crabby-images/7abd2/7abd22e7219445f323bc5ee0c8149bcf04f347dc" alt="000000092177_gray.jpg" | data:image/s3,"s3://crabby-images/8d45d/8d45d76454be0746ab33ac96e0d618afac1bd83d" alt="000000092177_color.jpg" | data:image/s3,"s3://crabby-images/5ace6/5ace61e489f25371d0514257799d5b5e7f989e9f" alt="000000092177_gt.jpg" |
| data:image/s3,"s3://crabby-images/87dd4/87dd485e7b0748c432d01b5168b0195f93febb2e" alt="000000166426_gray.jpg" | data:image/s3,"s3://crabby-images/2106e/2106e6804a20b957ece99c83423584fcbb008e1d" alt="000000166426_color.jpg" | data:image/s3,"s3://crabby-images/3ac88/3ac884c68870c52d59e79701179eea98cfc24282" alt="000000025560_gt.jpg" |
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.
|