Original Source Here
Generating Portraits: An AI Approach Using GANs
What if a new Picasso could be produced in 2021? Or a Van Gogh? Or maybe Degas is more appealing? They say that artists never die when their art lives on, but what if their art could be used to generate new works of art?
The idea of our project came about when we asked ourselves “what if we could generate new pieces from famous artists that have long since passed?” We set forth to create a model that would generate new paintings by these artists. We focused on portraiture as these types of paintings are amongst the most complicated to recreate and would best demonstrate the endless possibilities for the generation of fine art. The goal of our project is to generate realistic portraits by using a dataset of original portraits from 37 artists including Goya, Degas, Renoir, Rembrandt, Titian, Van Gogh, and Picasso.
What are GANs and how do they work?
Generative Adversarial Networks (GANs) were first proposed by Ian Goodfellow and other researchers at the Université de Montréal in 2014. GANs are built on the idea of automatic discovery and utilizing learning patterns within the input data in such a way that the model can generate new data which is similar to the original data. The goal is to make the “generated data” so similar that it is indistinguishable from the original data. Currently, there exists a wide range of applications for GANs, such as increasing resolution of images, image editing, video generation, and voice generation to name a few.
GANs are composed of two neural networks:
- A generator that is trained to generate new examples meant to be unique but indistinguishable from the training data.
- A discriminator that tries to classify whether the generated images are fake or real.
Both the generator and the discriminator are competing with each other (thus the term “adversarial”) where the generator generates new, synthetic images that are able to fool the discriminator into passing them as real images. Whereas, the discriminator’s job is to categorize the real and “fake”, generated images, properly, ergo will try to keep from being fooled. During the training, the generator gets progressively better at generating realistic synthetic images by incorporating feedback from the discriminator. If the training goes well, then as the generator gets better, the discriminator becomes increasingly worse at distinguishing between the real and generated images. Equilibrium is reached at the point where the discriminator cannot distinguish real images from the dataset and the fake images from the generator. Graph 1 outlines how this process works and its expected outputs, with both the real images and the generated images from this project.
Choosing the right dataset
The dataset built for this project is extracted from this Kaggle dataset called the Best Artworks of All Time. The full dataset contains the work of fifty artists, a total of 8446 images, with some folders containing as little as 24 works of at while others as many as 877. The model is trained on 1000 portraits from the following artists:
Pissarro, Caravaggio, Monet, Rivera, Velazquez, Degas, Manet, Munch, El Greco, Delacroix,
Kahlo, Courbet, Toulouse-Lautrec, Matisse, Rousseau, Bosch, Van Eyck, Miro, Malevich, Da Vinci, Chagall, Vrubel,
Cezanne, Gauguin, Klee, Rubens, Renoir, Mondrian, Raphael, Rembrandt, Botticelli, Titian, Van Gogh &Turner.
The choice to pick a portion of the original dataset for this project was made because despite the subjectivity of art, there are distinctive styles (expressionism, surrealism, abstraction, realism, etc.) and different types (landscapes, portraits, still life’s, etc.) of paintings. To avoid issues in learning multiple combinations of styles and types of paintings without ample examples of each combination (thousands) and the limitation of time (taking weeks to train), only portraits with a single individual were incorporated from the artists above.
— Pre-processing —
The varying sizes of the portraits in the newly formed dataset required pre-processing before the model can be trained. These images are required to be the same size (160 x 160) before they can be processed by the keras layers in our model. Nonetheless, it is important to note that the manner in which resizing is done can warp the painting, which is why TensorFlow’s smart resize feature was implemented. This feature resizes images to a specified target size without aspect ratio distortion, but it is not perfect. Therefore, following resizing, despite the process being lengthy, the portraits were individually reviewed to determine if there were any unusual distortions in the images. After this, both the generator and the discriminator were trained using the Keras Sequential API.
— Image Augmentation —
Training a GANs requires a large dataset for the model to learn and prevent overfitting on the dataset. In general, a large dataset (∼10,000 rows of information) is required to train these generative models. Considering that the extracted dataset of portraits contains only 1000 images, the images had to be augmented by using imgaug library on Python. Some of these portraits were randomly flipped from left to right, with both the image hue and saturation being changed, as shown in Exhibit 1 with the extreme changes inflicted on the Portrait of Jeanne by Camille Pissarro.
While this increased the size of the dataset to approximately 6220 images, it also required additional review of the images to fix the issues created with flipped images and varying hue & saturation. The code for this is found below:
— Training the GANs—
GANs are trained on Google Colab Pro so that the model is able to train for longer periods of time without getting disconnected. It was trained on a Tesla v100 16gb GPU. The speed was roughly 12–20 seconds per epoch. After the dataset was augmented, training was resumed from an earlier saved model from the original training and then trained for 16,000 epochs (>65 hours).
— The Generator —
The tf.keras.layers.Conv2DTranspose is used to produce an image from random noise, which is then used as an input to a Dense layer and is progressively up-sampled until the dimensions attain 160X160 from 10X10. The untrained generator can then be used to generate new images.
— Discriminator —
The discriminator is a Convolutional Neural Network (CNN) image classifier. The untrained discriminator is used to classify the generated images. The model is trained such that the output is a positive value for real images and a negative value for fake images. The discriminator neural network for this model incorporates eleven layers.
— Training Function —
Once the generator and discriminator are set up, the training function is the used to train the GANs. An image display function is used to show the evolution of generated images using the same random seed. The discriminator will then classify these images as either real images which are drawn from the training dataset or fake images created by the generator. The generator and discriminator loss is then calculated for each of these models and these gradients are used to update the weights via backpropagation.
— Generator & Discriminator Loss —
The generator loss is a quantitative measure of the success the generator is having in deceiving the discriminator with its generated images. Therefore, it measures whether the generator is outputting images that are unique yet indistinguishable from the original dataset, essentially fooling the discriminator. If the generator is successfully producing generated images, then the discriminator should be predicting fake image as real and outputting positive values. Whereas, the discriminator loss provides a quantitative measure of how well the discriminator is classifying the real and fake images. It does so by comparing the predictions of real images to the array of 1s’ and the predictions of fake images to the array of 0’s.
The model was able to generate new pieces according to the dataset provided. The progression of the convergence of these works is featured in Exhibit 2. The individual images can be seen in greater detail in this github folder.
Further inspection into the final generated images demonstrates that the GANs has overfitted on three specific paintings, appropriately titled GANs favourites in Exhibit 3. These three paintings are incorporated into approximately one-third of the generated paintings being produced. This demonstrates the importance in procuring a large dataset to train the model, and in future work this dataset can be augmented further to solve the problem of repeated images.
The results of this work were that it is possible to generate new works of art, but it requires a larger dataset and additional training time to create unique pieces.
— Limitations —
The small extracted dataset of 1000 images is a limitation for this project as it is one of the reasons that overfitting occurred. Due to the lack of variety in the images, the model overfit the data and yielded almost identical portraits of three main portraits by Rembrandt seen in Exhibit 3.
Despite the data augmentation that was done to the dataset, further augmentation would benefit this project, whether by including additional portraits from other artists or another method of increasing the size of the dataset.
Additionally, our project is a good example of mode collapse, where GANs fails to produce different outputs. This is potentially caused by a problem present in training, where the generator produces similar data as it is able to easily fool the discriminator. This can be solved by further data augmentation and additional training time, potentially training the model for approximately two weeks or more, instead of overnight.
GANs can be used to generate new works of art by painters, albeit with some caveats. As we have seen in this project, GANs can easily overfit to a dataset and run into the mode collapse problem. Therefore, a large dataset and ample training time are requirements to create unique yet indistinguishable generated images of paintings. As it stands, the model would not be able to generate a new painting by a single artist but nonetheless has the capability of generating new paintings within the realm of different types and styles of art. Our team did go back into the dataset and implemented further image augmentation. While it was not able to converge because of the time limitations, it did provide insights into the potential for future work. The possibilities of generating new fine art with the familiarity of beloved artists can happen with data augmentation, weeks of training, and manual review of both the original images and the generated ones.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot