File size: 2,871 Bytes
bde8781
 
 
 
 
 
 
 
 
f332b85
bde8781
f332b85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bde8781
f332b85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bde8781
f332b85
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
datasets:
- https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/
---

| Generated | Real (for comparison) |
|  ----- | --------- |
|    ![image/png](https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/VXw25fJbHok5eZTQcn3Kd.png)    |      ![image/png](https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/Kj0lbfg5P5fTuG6eawdE8.png)   |

This GAN model is trained on the [FGVC Aircraft](https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/) dataset. The model uses [Progressive Growing](https://arxiv.org/pdf/1710.10196.pdf) with [Spectral Normalization](https://arxiv.org/pdf/1802.05957.pdf).

The work builds up on https://huggingface.co/PrakhAI/AIPlane and https://huggingface.co/PrakhAI/AIPlane2.

This model was trained to generate 256x256 images of Aircrafts. The implementation in JAX on Colab can be found [here](https://colab.research.google.com/github/prakharbanga/AIPlane3/blob/main/AIPlane3_ProGAN_%2B_Spectral_Norm_(256x256).ipynb).

# Convolutional Architecture

A significant improvement over https://huggingface.co/PrakhAI/AIPlane2 is the elimination of "checkerboard" artifacts. This is done by using Image Resize followed by Convolution layer in the Generator instead of a Transposed Convolution where the kernel size is not divisible by the stride.

| Transposed Convolution (kernel size not divisible by stride) | Resize followed by convolution |
| - | - |
| ![image/png](https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/Vs1Dks67tteJGA2EaVMjW.png) | ![image/png](https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/fz_Gv0UIYh_Z1GZ2TrCW1.png) |

# Image Quality

The model, while generating several high quality images of Airplanes, also generates poor quality images.

A total of 400 generated images were labeled by hand as either desirable (151) or undesirable (249).

| Sample desirable outputs | Sample undesirable outputs |
|    --------- | ------------ |
| ![image/png](https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/YkIba5DXFIGwVX0fs1Han.png) | ![image/png](https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/p4cU-1LfNbmdePOUk-CF5.png) |

# Latent Space Interpolation

Latent Space Interpolation can an educational exercise to get deeper insight into the model.

It can be observed below that several aspects of the generated image such as the color of the sky, grounded-ness of the plane, as well as the plane shape and color are frequently continuous through the latent space.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/Hx_a5OzCwdWBIvH-7hvR3.png)

# Training Progression

<video controls width="50%" src="https://cdn-uploads.huggingface.co/production/uploads/649f9483d76ca0fe679011c2/o2NDDMQPhdEY5Vc96b31G.mp4"></video>