Original Source Here
I started by writing a script to collect paintings of people that are in the public domain on WikiArt and open-source images of people from Google’s Open Images dataset. I pre-processed the images by aligning facial features and filling in any blank spots using the LaMa system for inpainting . I then trained two GANs, StyleGAN 2 ADA  and VQGAN , with the GANfolk training set of 5,400 images. I collected 2,700 old paintings of people and 2,700 new photos of people.
For the first step in the creation process, the trained StyleGAN 2 system generated 1,000 images as a baseline set. I used GPT-3 from OpenAI  to generate text prompts for the pictures, like “drawing of a thoughtful Brazilian girl.” I then used the CLIP system , also from OpenAI, to find the best images that match the prompt. I chose the best picture and fed it into the trained VQGAN system for further modification to get the image to more closely match the text prompt.
I went back to GPT-3 and asked it to write names and a brief backstory for each portrait. As a post-processing step, I added a vignette effect and resized the image up by four times (from 512×512 to 2048×2048). After a mild editing pass, I uploaded the pictures and backstories to OpenSea for sale as the GANfolk NFTs.
Before I get into the details of GANfolk, here is a brief section on what other people have done to generate portraits of people.
The Flickr-Faces-HQ Dataset
In conjunction with developing their StyleGAN series of generative networks, NVidia released a dataset of photographs of people called Flickr-Faces-HQ Dataset (FFHQ). According to NVidia…
… [the] dataset consists of 70,000 high-quality PNG images at 1024×1024 resolution and contains considerable variation in terms of age, ethnicity and image background.
Although the quality and variety of the FFHQ images are excellent, NVidia released the dataset under non-commercial terms. Also, I found that the faces seem too “tightly cropped” to make good portraits.
The MetFaces Dataset
NVidia also released the MetFaces dataset of faces from paintings in the Metropolitan Museum of Art. They write that…
[the] dataset consists of 1,336 high-quality PNG images at 1024×1024 resolution. The images were downloaded via the Metropolitan Museum of Art Collection API, and automatically aligned and cropped using dlib. Various automatic filters were used to prune the set.
Again, NVidia released the dataset under non-commercial terms, and they used similarly tight cropping for the faces.
Here’s what newly generated images look like with StyleGAN 2 ADA trained on the FFHQ dataset and fine-tuned with the MetFaces dataset.
Although the results are impressive, not surprisingly, the results seem to be too tightly cropped. In addition to the datasets, NVidia released the official source code under non-commercial terms, so these faces cannot be sold as NFTs. Also, there seems to be a distinct lack of cultural diversity in the generated faces.
GANfolk System Components
I will discuss the details of the components and processes used in the GANfolk system in the following sections.
Gathering and Pre-Processing the Training Images
I wrote two scripts to gather the source images for GANfolk. The first gathers public domain paintings on WikiArt from the 19th and early 20th centuries. The second collects portraits from Google’s Open Images dataset. The dataset consists of photos on Flickr released under the CC-BY-SA license, which allows for commercial use.
To find and orient the faces in the images, I used a face-finding algorithm from a package called DLIB. I modified the face-finding code to crop the faces more loosely. Here are some of the results from the paintings from WikiArt.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot