Text-to-Image
Diffusers
English
learn_ddpm / README.md
aharshit123456's picture
Update README.md
2294e2e verified
---
library_name: diffusers
license: mit
datasets:
- uoft-cs/cifar10
- nyanko7/danbooru2023
language:
- en
pipeline_tag: text-to-image
---
# DDPM Project
This repository contains the implementation of Denoising Diffusion Probabilistic Models (DDPM).
## Table of Contents
- [Introduction](#introduction)
- [Installation](#installation)
- [Usage](#usage)
- [Contributing](#contributing)
## Introduction
Denoising Diffusion Probabilistic Models (DDPM) are a class of generative models that learn to generate data by reversing a diffusion process. This repository provides a comprehensive implementation of DDPM.
## Installation
To install the necessary dependencies, run:
```bash
pip install -r requirements.txt
```
## Usage
To train the model, use the following command:
```bash
python train.py
```
To generate samples, use:
```bash
python generate.py
```
## Game
To understand the model and it's workings, we're working on a cool cute little game where the user is the UNET reverser/diffusion model and is tasked to denoise the images with noise made of grids of lines.
Use [learndiffusion.vercel.app](learndiffusion.vercel.app) to access the primitive version of the game. You can also contribute to the game by checking out at the diffusion_game branch. A new model showcase will also be added such that the model's weights are loaded from the internet, model's files are installed and loaded into a gradio interface for direct use/inference on the vercel. Feel free to make changes for the same, issue is opened.
## Explanations and Mathematics
- slides from presentation :
- notes/explanations : [HERE](slides\notes)
- a cute lab talk ppt:
- plato's allegory : \<link to REPUBLIC>
## Resources
- Original Paper : https://arxiv.org/pdf/2006.11239
- Improvement Paper : https://arxiv.org/abs/2102.09672
- Improvement by OpenAI : https://arxiv.org/pdf/2105.05233
- Stable Diffusion Paper : https://arxiv.org/abs/2112.10752
-
### Papers for background
- UNET Paper for Biomedical Segmentation
- Autoencooder
- Variational Autoencoder
- Markov Hierarchical VAE
- Introductory Lectures on Diffusion Process
### Youtube videos and courses
#### Mathematics
- Outliers
- Omar Jahil
#### Pytorch Implementation
- [Deep Findr](https://www.youtube.com/watch?v=a4Yfz2FxXiY)
- [Notebook from Deep Findr](https://colab.research.google.com/drive/1sjy9odlSSy0RBVgMTgP7s99NXsqglsUL?usp=sharing)
## Pretrained Weights
weights from the model can be found in [pretrained_weights](https://drive.google.com/drive/folders/1NiQDI3e67I9FITVnrzNPP2Az0LABRpic?usp=sharing)
For loading the pretrained weights:
```
model2 = SimpleUnet()
model2.load_state_dict(torch.load("/content/drive/MyDrive/Research Work/mlsa/DDPM/model_weights.pth"))
model2.eval()
```
For making inferences
TODO: Errors in the sampling function, boolean errors and etc. Will open issues for solving by others as exercise if needed.
```
num_samples = 8 # Number of images to generate
image_size = (3, 32, 32) # Example for CIFAR10
noise = torch.randn(num_samples, *image_size).to("cuda")
model2.to("cuda")
# Generate images by denoising
with torch.no_grad():
generated_images = model2.sample(noise)
# Save the generated images
save_image(generated_images, "generated_images.png", nrow=4, normalize=True)
```
## Contributing
Contributions are welcome! Please open an issue or submit a pull request.
## Future Ideas
- Make the model onnx compatible for training and inferencing on Intel GPUs
- Build a Stable Diffusion model Text2Img using CLIP implementationnnnn !!!
- Train the current model for a much larger dataset with more generalizations and nuances