Original Source Here
Introduction to Generative Adversarial Networks using Pytorch: BollyGAN
“Generative Adversarial Networks is the most interesting idea in the last 10 years in Machine Learning.” — Yann LeCun, Director of AI Research at Facebook
Ever heard of “Fake it till you make it!!!”?
Generative Deep Learning models do exactly that but with the help of mathematics, statistics and data. It works by learning the latent variables which are the variables responsible for the generation of the input data and then using this latent space to generate synthetic data.
Basically, Generative modeling is an unsupervised learning task in Deep learning that involves for a given input, model learning the probability distribution of the variables making up this input and generate something new — either resembling the input or a new synthetic output.
Generative Deep Learning models have been employed in a wide range of applications ranging from language synthesis with LSTM to Google’s DeepDream algorithm to executing neural style transfers to generating DeepFakes and new dataset curation.
Generative adversarial networks (GANs)
GANs are a subset of Generative Deep Learning models which was introduced by Goodfellow et al. in the year 2014 as an alternative to VAEs (Variational Autoencoders) for learning latent spaces of images making it possible to construct fairly realistic synthetic images by compelling the generated images to be statistically almost indistinguishable from actual ones.
Generative Adversarial Network takes the following approach
GAN is made up of two neural networks, Which I like to call a Faker and an Expert. The Faker tries to fool the expert while the expert works to report the bluff.
- Generator (Faker): Creates a “fake” or synthetic samples using random vectors/matrix given to it as input.
- Discriminator (Expert): Strives to determine whether a given sample is “genuine” (drawn from training data) or “fake” (generated by the generator).
Training takes place in parallel: we train the discriminator for a few epochs, then the generator for a few epochs, and so on. As a result, both the generator and the discriminator improve their performance.
However, GANs are infamously difficult to train and highly sensitive to hyperparameters, activation functions, and regularisation.
We’ll train a GAN to produce images of Bollywood celebrities’ faces in this tutorial.
We will be using DC-GAN which is a direct extension of the GAN described above, except that convolutional and convolutional-transpose layers are explicitly used in the discriminator and generator, respectively. Radford et al. first defined it in their study Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks.
We’ll be using the Bollywood Celeb faces Dataset, which consists of over 8664 cropped Bollywood celeb faces. Note that generative modeling is an unsupervised learning task, so the images need not have any labels.
I fetched data to Google collab from Kaggle [Link]
To load this dataset to PyTorch I used the
ImageFolder class from
torchvision. I also resized and crop the images to 64×64 px, and normalize the pixel values with a mean & standard deviation of 0.5 for each channel. This will ensure that pixel values are in the range, which is more convenient for training the discriminator. We will also create a data loader to load the data in batches.
Discriminator Network (Spot Fakes)
Used convolutional neural networks (CNN) which output a single number output for every image. Using stride of 2 to progressively reduce the size of the output feature map.
Generator Network (Create Fakes)
The input to the generator is typically a vector or a matrix of random numbers (referred to as a latent tensor) which is used as a seed for generating an image. The generator will convert a latent tensor of shape
(64, 1, 1) into an image tensor of shape
3 x 28 x 28. To achieve this, I used the
ConvTranspose2d layer from PyTorch, which performs a transposed convolution (also referred to as a deconvolution).
Since the discriminator is a binary classification model, we can use the binary cross-entropy loss function to quantify how well it is able to differentiate between real and generated images.
The steps involved in training the discriminator.
- We expect that the discriminator will return 1 if the image was selected from the real dataset and 0 if it was generated using the generator network.
- We first pass a batch of real images, and compute the loss, setting the target labels to 1.
- Then we input a batch of fake photos (produced by the generator) into the discriminator and compute the loss, with the target labels set to 0.
- Finally, we combine the two losses and utilize the aggregate loss to execute gradient descent on the discriminator’s weights.
It is important to notice that the weights of the generator model are not changed while training the discriminator (opt d only impacts the discriminator). parameters())
Since the generator’s outputs are images, it’s unclear how we can train the generator. This is where we apply a very clever trick: we use the discriminator as part of the loss function. This is how it works:
- We use the generator to generate a batch of photos, which we then feed into the discriminator.
- We compute the loss by setting the target labels to 1, indicating that they are real. We do this because the generator’s goal is to “deceive” the discriminator.
- We use the loss to execute gradient descent, which involves changing the weights of the generator to make it better at generating realistic images in order to “trick” the discriminator.
Defining a fit function for each batch of training data to train the discriminator and generator in tandem. We’ll utilize the Adam optimizer along with some additional parameters (betas) that have been shown to function well with GANs. We will also preserve some sample-generated photos for inspection at regular intervals.
Sometimes despite getting good scores pictures obtaining clean images corresponding to a noisy image gets difficult.
Some steps that we follow to reduce this will be:
- Using a dataset with a large number of good quality images.
- Experiment with the hyperparameters such as epochs, learning rate, or batch size.
In conclusion, I would like to say that GAN has a huge potential in the machine/deep learning sector and I have thoroughly enjoyed this process of learning and executing BollyGAN.
Thank you for the platform Jovian.
- Complete Notebook: https://jovian.ai/rupshakr/bollywoodcelebfacegenrator-dcgan
- Dataset: https://www.kaggle.com/sushilyadav1998/bollywood-celeb-localized-face-dataset
- DCGAN PyTorch tutorial: https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html
- Deep Learning with PyTorch: Zero to GANs by Jovian.ai: https://jovian.ai/learn/deep-learning-with-pytorch-zero-to-gans
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot