File size: 3,754 Bytes
5d1262c
 
 
 
 
 
dca21fc
5d1262c
d6f77c9
 
5d1262c
 
 
 
 
c777a8f
5d1262c
 
 
 
 
dca21fc
 
 
 
 
 
 
 
 
d6f77c9
dca21fc
d6f77c9
834600d
e039ba0
834600d
d6f77c9
834600d
d6f77c9
 
 
 
 
834600d
d6f77c9
834600d
d6f77c9
 
e039ba0
 
 
 
87413ac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
datasets:
- cifar10
- https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/
---

GAN model trained on [CIFAR10 (Airplane)](https://www.tensorflow.org/datasets/catalog/cifar10) and [FGVC Aircraft](https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/) images. The model leverages [Progressive Growing](https://arxiv.org/pdf/1710.10196.pdf) with [Spectral Normalization](https://arxiv.org/pdf/1802.05957.pdf).

Try out this model [here](https://huggingface.co/spaces/PrakhAI/AIPlane).

| Generated Images | Real Images (for comparison) |
| -------- | --------- |
| ![generated_1691259071.png](https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/DNio2mes1414p6cgm7K62.png) | ![image.png](https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/4Sp33Hl9JK2cfHzBXHXfh.png) |

# Training Progression
<video width="50%" controls>
  <source src="https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/qFlnTITZwS3DSTxLp0Oa8.mp4" type="video/mp4">
</video>

# Details
[Colab Notebook](https://colab.research.google.com/drive/1b4KFZOnLERwQW_3jQ8FMABepKEAcDIK7?usp=sharing)

The model generates 32 x 32 images of Airplanes. It is trained on an NVIDIA T4 Colab Runtime.

The Critic consists of Convolutional Layers (3x3 kernel) with strides for downsampling, and Leaky ReLU activation. The critic uses [Spectral Normalization](https://arxiv.org/pdf/1802.05957.pdf), with more details [here](#spectral-normalization).

The Generator uses Transposed Convolutions (2x2 kernel) with strides for upsampling, and ReLU activation. The generator uses the variant of pixel-level Local Response Normalization proposed in the [Progressive Growing](https://arxiv.org/pdf/1710.10196.pdf) paper.

# Spectral Normalization

Spectral Normalization is a technique suggested for training GANs in [this paper](https://arxiv.org/pdf/1802.05957.pdf).

It aims to make the Critic's (Discriminator's) outputs mathematically continuous w.r.t. the space of input images, avoiding exploding gradients.

Spectral Normalization works very well in practice to stabilize the training of the GAN, as demonstrated by the example below (comparison at equivalent points during training):

| Batch Normalization | Spectral Normalization |
| ----------- | ------------ |
| ![image.png](https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/PNbqYRjw24OhMManXaMS9.png) | ![image.png](https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/F8q4y2vshssfdc70jH_X2.png) |

# Progressive Growing

Progressive Growing of GAN resolutions is suggested to improve the Quality and Stability of GAN training, especially for higher resolution models (1024x1024).

For 32x32 images of Airplanes, even a short initial round of Progressive Growing provides significant improvement (comparison at equivalent points during training):

| Flat Growing | Progressive Growing |
| ----------- | ------------ |
| ![image.png](https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/QnTET-5ae_0x11CcXeWgR.png) | ![image.png](https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/F8q4y2vshssfdc70jH_X2.png) |

The generator for this model generates 4x4, 8x8, 16x16 and 32x32 images, which form the inputs for the critic. Each resolution is associated with a 'weight' (α<sub>4</sub>, α<sub>8</sub>, α<sub>16</sub>, α<sub>32</sub>), which indicate the focus on the corresponding image resolution at any given time during the training.

At the beginning of the training, α<sub>4</sub>=1, α<sub>8</sub>=0, α<sub>16</sub>=0, α<sub>32</sub>=0, with the values being α<sub>4</sub>=0, α<sub>8</sub>=0, α<sub>16</sub>=0, α<sub>32</sub>=1 towards the end.