“ARTGAN” — A Simple Generative Adversarial Networks Based On Art Images Using DeepLearning &…

Original Source Here

“ARTGAN” — A Simple Generative Adversarial Networks Based On Art Images Using DeepLearning & Pytorch

Generative Adversarial Networks, or GANs for short, are an approach to generative modeling using deep learning methods, such as convolutional neural networks.


This project on “ARTGAN” is a simple generative adversarial network-based on art images using deep learning & PyTorch. Here we use matplotlib, PyTorch to implement our project.

Generative Adversarial Networks, or GANs for short, are an approach to generative modeling using deep learning methods, such as convolutional neural networks. This technique learns to generate new data with the same statistics as the training set given a training set.

Generative modeling is an unsupervised learning task in machine learning that involves automatically discovering and learning the regularities or patterns in input data in such a way that the model can be used to generate or output new examples that plausibly could have been drawn from the original dataset.

If anybody asks you to think of a human, what does our imagination lead us to? Do we create a person in our head? Well, no. Human brains always tend to think of someone they already know, so we think of someone close to us or someone we last met or maybe even some random person we met but we cannot think of someone whom our eyes have never captured an image of. This task can be done officially by this technology, GAN. Still for a better understanding let us take an example.

Suppose we have a random picture of a person, now this picture can easily be manipulated using GAN to create life-like pictures of fake humans. Pictures which would look genuine and of absolutely different people but will be just manipulations of another picture.

GANs are a clever way of training a generative model by framing the problem as a supervised learning problem with two sub-models, the first one is the generator model by which we can train to generate new examples and the second one is discriminator model by which we can classify the examples as real (from the domain) or fake (generated). The two models are trained together in a zero-sum game, adversarial, until the discriminator model is fooled about half the time, meaning the generator model is generating plausible examples.

GANs are an exciting and rapidly changing field, delivering on the promise of generative models in their ability to generate realistic examples across a range of problem domains, most notably in image-to-image translation tasks such as translating photos of summer to winter or day to night, creating illusions of different scenarios and in generating photorealistic photos of objects, scenes, and people that even humans cannot tell are fake.

Here, my entire project is upon manipulating drawings.


We can use the opendatasets library to download the dataset from Kaggle. opendatasets uses the Kaggle Official API for downloading datasets from Kaggle. Follow the below-mentioned steps to find your API credentials:

1. Sign in to https://kaggle.com/, then click on your profile picture on the top right and select “My Account” from the menu.

2. Scroll down to the “API” section and click “Create New API Token”. This will download a file kaggle.json with the following contents:


3. When you run opendatsets.download, you will be asked to enter your username & Kaggle API, which you can get from the file downloaded in step 2. Note that you need to download the kaggle.json file only once. On Google Colab, you can also upload the kaggle.json file using the files tab, and the required credentials will be read automatically.

!pip install opendatasets --upgrade --quietimport opendatasets as oddataset_url = 'https://www.kaggle.com/ikarus777/best-artworks-of-all-time'od.download(dataset_url)

Import the dataset into PyTorch:

Import the dataset into PyTorch

This dataset is comprised of 2 folders:

images: This folder contains all image files within folders of artists.

resized: This folder contains all image files within one folder.

As a GAN classifies the data as real or fake, the artist categories are not needed so I have only used the files in the resized folder. Use the ImageFolder class from torchvision.

data_dir = '/content/best-artworks-of-all-time/resized/'import osfor cls in os.listdir(data_dir):print(cls, ':', len(os.listdir(data_dir + '/' + cls)))from torchvision.datasets import ImageFolderdataset = ImageFolder(data_dir)len(dataset)

Let’s see some of the art pictures present in the dataset:

import matplotlib.pyplot as plt%matplotlib inlineimg, label = dataset[0]plt.imshow(img)img, label = dataset[500]plt.imshow(img)img, label = dataset[741]plt.imshow(img)

Making batches of pictures:

1. First I’ll resize and center crop all images to ensure that they are all in the same shape and size.

2. Then convert them to tensors and normalize them.

3. Then create the data loader.

4. Then look at some samples.

import torchvision.transforms as ttfrom torch.utils.data import DataLoaderimage_size = 64batch_size = 128stats = (0.5, 0.5, 0.5), (0.5, 0.5, 0.5)train_ds = ImageFolder(data_dir, transform=tt.Compose([tt.Resize(image_size),tt.CenterCrop(image_size),tt.ToTensor(),tt.Normalize(*stats)]))train_dl = DataLoader(train_ds, batch_size, shuffle=True, num_workers=3, pin_memory=True)from torchvision.utils import make_griddef denorm(img_tensors):return img_tensors * stats[1][0] + stats[0][0]def show_images(images, nmax=64):fig, ax = plt.subplots(figsize=(8, 8))ax.set_xticks([]); ax.set_yticks([])ax.imshow(make_grid(denorm(images.detach()[:nmax]), nrow=8).permute(1, 2, 0))def show_batch(dl, nmax=64):
for images, _ in dl:
show_images(images, nmax)
A batch of images

Using a GPU:

To seamlessly use a GPU, if one is available, we define a couple of helper functions (get_default_device & to_device) and a helper class “DeviceDataLoader” to move our model & data to the GPU, only if one is available.

def get_default_device():"""Pick GPU if available, else CPU"""if torch.cuda.is_available():return torch.device('cuda')else:return torch.device('cpu')def to_device(data, device):"""Move tensor(s) to chosen device"""if isinstance(data, (list,tuple)):return [to_device(x, device) for x in data]return data.to(device, non_blocking=True)class DeviceDataLoader():"""Wrap a dataloader to move data to a device"""def __init__(self, dl, device):self.dl = dlself.device = devicedef __iter__(self):"""Yield a batch of data after moving it to device"""for b in self.dl:yield to_device(b, self.device)def __len__(self):"""Number of batches"""return len(self.dl)import torchdevice = get_default_device()devicetrain_dl = DeviceDataLoader(train_dl, device)

Discriminator Network:

The discriminator Network as the name itself suggests it discriminates the manipulated pictures, takes images as input, and tries to classify them as “real” or “generated”. In this sense, it’s like any other neural network. We’ll use a convolutional neural network (CNN) which outputs a single number output for every image. Further, we’ll use a stride of 2 to progressively reduce the size of the output feature map. Thus, giving a clearer view of our project.

import torch.nn as nndiscriminator = nn.Sequential(# in: 3 x 64 x 64nn.Conv2d(3, 64, kernel_size=4, stride=2, padding=1, bias=False),nn.BatchNorm2d(64),nn.LeakyReLU(0.2, inplace=True),# out: 64 x 32 x 32nn.Conv2d(64, 128, kernel_size=4, stride=2, padding=1, bias=False),nn.BatchNorm2d(128),nn.LeakyReLU(0.2, inplace=True),# out: 128 x 16 x 16nn.Conv2d(128, 256, kernel_size=4, stride=2, padding=1, bias=False),nn.BatchNorm2d(256),nn.LeakyReLU(0.2, inplace=True),# out: 256 x 8 x 8nn.Conv2d(256, 512, kernel_size=4, stride=2, padding=1, bias=False),nn.BatchNorm2d(512),nn.LeakyReLU(0.2, inplace=True),# out: 512 x 4 x 4nn.Conv2d(512, 1, kernel_size=4, stride=1, padding=0, bias=False),# out: 1 x 1 x 1nn.Flatten(),nn.Sigmoid())discriminator = to_device(discriminator, device)

Generator Network:

The input to the generator is typically a vector or a matrix of random numbers (referred to as a latent tensor) which is used as a seed for generating an image. The generator will convert a latent tensor of shape (128, 1, 1) into an image tensor of shape 3 x 28 x 28. To achieve this, we’ll use the ConvTranspose2d layer from PyTorch, which is performed to as a transposed convolution (also referred to as a deconvolution).

latent_size = 128generator = nn.Sequential(# in: latent_size x 1 x 1nn.ConvTranspose2d(latent_size, 512, kernel_size=4, stride=1, padding=0, bias=False),nn.BatchNorm2d(512),nn.ReLU(True),# out: 512 x 4 x 4nn.ConvTranspose2d(512, 256, kernel_size=4, stride=2, padding=1, bias=False),nn.BatchNorm2d(256),nn.ReLU(True),# out: 256 x 8 x 8nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1, bias=False),nn.BatchNorm2d(128),nn.ReLU(True),# out: 128 x 16 x 16nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1, bias=False),nn.BatchNorm2d(64),nn.ReLU(True),# out: 64 x 32 x 32nn.ConvTranspose2d(64, 3, kernel_size=4, stride=2, padding=1, bias=False),nn.Tanh()# out: 3 x 64 x 64)

Let’s see how our first generative image look’s like :

xb = torch.randn(batch_size, latent_size, 1, 1) # random latent tensorsfake_images = generator(xb)print(fake_images.shape)show_images(fake_images)
1st generative picture

It seems like it’s really poor. So for that now we have to train our model.

Discriminator Training:

Since the discriminator is a binary classification model, we can use the binary cross-entropy loss function to quantify how well it can differentiate between real and generated images.

Here are the steps involved in training the discriminator

1. We expect the discriminator to output 1 if the image was picked from the real MNIST dataset, and 0 if it was generated using the generator network.

2. We first pass a batch of real images, and compute the loss, setting the target labels to 1.

3. Then we pass a batch of fake images (generated using the generator) pass them into the discriminator, and compute the loss, setting the target labels to 0.

4. Finally we add the two losses and use the overall loss to perform gradient descent to adjust the weights of the discriminator.

It’s important to note that we don’t change the weights of the generator model while training the discriminator (opt_d only affects the discriminator.parameters())

def train_discriminator(real_images, opt_d):# Clear discriminator gradientsopt_d.zero_grad()# Pass real images through discriminatorreal_preds = discriminator(real_images)real_targets = torch.ones(real_images.size(0), 1, device=device)real_loss = F.binary_cross_entropy(real_preds, real_targets)real_score = torch.mean(real_preds).item()# Generate fake imageslatent = torch.randn(batch_size, latent_size, 1, 1, device=device)fake_images = generator(latent)# Pass fake images through discriminatorfake_targets = torch.zeros(fake_images.size(0), 1, device=device)fake_preds = discriminator(fake_images)fake_loss = F.binary_cross_entropy(fake_preds, fake_targets)fake_score = torch.mean(fake_preds).item()# Update discriminator weightsloss = real_loss + fake_lossloss.backward()opt_d.step()return loss.item(), real_score, fake_score

Generator Training:

Since the outputs of the generator are images, it’s not obvious how we can train the generator. This is where we employ a rather elegant trick, which is to use the discriminator as a part of the loss function. Here’s how it works:

1. We generate a batch of images using the generator, pass it into the discriminator.

2. We calculate the loss by setting the target labels to 1 i.e. real. We do this because the generator’s objective is to “fool” the discriminator.

3. We use the loss to perform gradient descent i.e. change the weights of the generator, so it gets better at generating real-like images to “fool” the discriminator.

def train_generator(opt_g):# Clear generator gradientsopt_g.zero_grad()# Generate fake imageslatent = torch.randn(batch_size, latent_size, 1, 1, device=device)fake_images = generator(latent)# Try to fool the discriminatorpreds = discriminator(fake_images)targets = torch.ones(batch_size, 1, device=device)loss = F.binary_cross_entropy(preds, targets)# Update generator weightsloss.backward()opt_g.step()return loss.item()from torchvision.utils import save_imagesample_dir = 'generated'os.makedirs(sample_dir, exist_ok=True)def save_samples(index, latent_tensors, show=True):fake_images = generator(latent_tensors)fake_fname = 'generated-images-{0:0=4d}.png'.format(index)save_image(denorm(fake_images), os.path.join(sample_dir, fake_fname), nrow=8)print('Saving', fake_fname)if show:fig, ax = plt.subplots(figsize=(8, 8))ax.set_xticks([]); ax.set_yticks([])ax.imshow(make_grid(fake_images.cpu().detach(), nrow=8).permute(1, 2, 0))fixed_latent = torch.randn(64, latent_size, 1, 1, device=device)save_samples(0, fixed_latent)

Full Training Loop:

Let’s define a fit function to train the discriminator and generator in tandem for each batch of training data. We’ll use the Adam optimizer with some custom parameters (betas) that are known to work well for GANs. We will also save some sample-generated images at regular intervals for inspection.

GAN architecture
from tqdm.notebook import tqdmimport torch.nn.functional as Fdef fit(epochs, lr, start_idx=1):torch.cuda.empty_cache()# Losses & scoreslosses_g = []losses_d = []real_scores = []fake_scores = []# Create optimizersopt_d = torch.optim.Adam(discriminator.parameters(), lr=lr, betas=(0.5, 0.999))opt_g = torch.optim.Adam(generator.parameters(), lr=lr, betas=(0.5, 0.999))for epoch in range(epochs):for real_images, _ in tqdm(train_dl):# Train discriminatorloss_d, real_score, fake_score = train_discriminator(real_images, opt_d)# Train generatorloss_g = train_generator(opt_g)# Record losses & scoreslosses_g.append(loss_g)losses_d.append(loss_d)real_scores.append(real_score)fake_scores.append(fake_score)# Log losses & scores (last batch)print("Epoch [{}/{}], loss_g: {:.4f}, loss_d: {:.4f}, real_score: {:.4f}, fake_score: {:.4f}".format(epoch+1, epochs, loss_g, loss_d, real_score, fake_score))# Save generated imagessave_samples(epoch+start_idx, fixed_latent, show=False)return losses_g, losses_d, real_scores, fake_scoreslr = 0.0002epochs = 300history = fit(epochs, lr)losses_g, losses_d, real_scores, fake_scores = history# Save the model checkpointstorch.save(generator.state_dict(), 'G.pth')torch.save(discriminator.state_dict(), 'D.pth')

Now start viewing some of the generative pictures….and see the progress as epochs numbers increasing:

from IPython.display import ImageImage('./generated/generated-images-0001.png')
1st epoch picture
31st epoch picture

Picture quality improved a lot after just 30th epoch.

90th epoch picture
201th epoch picture
Last epoch


1. Make a video on generative pictures.

2. Graphs

import cv2import osvid_fname = 'art_gans_training.avi'files = [os.path.join(sample_dir, f) for f in os.listdir(sample_dir) if 'generated' in f]files.sort()out = cv2.VideoWriter(vid_fname,cv2.VideoWriter_fourcc(*'MP4V'), 5, (530,530))[out.write(cv2.imread(fname)) for fname in files]out.release()

video making code for, see the progress of pictures as epoch increased.

plt.plot(losses_d, '-')plt.plot(losses_g, '-')plt.xlabel('epoch')plt.ylabel('loss')plt.legend(['Discriminator', 'Generator'])plt.title('Losses');
plt.plot(real_scores, '-')plt.plot(fake_scores, '-')plt.xlabel('epoch')plt.ylabel('score')plt.legend(['Real', 'Fake'])plt.title('Scores');

Video link:



Here we see that the quality of the fake picture is good, but fake image prediction values are not very good. So now again we can run it using a lower learning rate(lr) and near about 80–100 epochs may be able to increase the fake images prediction value. This was all about implementing GAN in drawing manipulation creating completely different prints of it. The same can be implemented on human pictures or generate art.

Project Link:



Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: