How to Build an AI Fashion Designer



Original Source Here

How to Build an AI Fashion Designer

Clothing semantic editing for fashion design using StyleGAN and GANSpace

ClothingGAN Demo [Image by Author]

Overview

This is a write-up for my old project ClothingGAN. The project generates clothing design with AI using StyleGAN and semantically edits it with attributes such as sleeve, size, dress, jacket, etc. You can also do style transfer as shown in the image above by first generating 2 different clothing designs (output 1) with different seed numbers. Then, it will generate a third design (output 2) that mixes the previous 2 designs. You can then adjust how much style or structure you want it to inherit from the two original designs.

You can try the demo here and here is the source code (feel free to star the repo)

Outline

  • Inspiration
  • How I Built It
  • Training StyleGAN model
  • Semantic Editing with GANSpace
  • Building UI with Gradio
  • Deploying to HuggingFace Space

Inspiration

GAN or Generative Adversarial Network is a generative model that is able to generate images by learning the probability distribution of a large image dataset. I always find GANs fascinating as it enables me to generate high-quality arts or design even without the technical or artistic skill in drawing. Recently, I’ve seen many face editing demonstrations on GAN, but have rarely seen semantic manipulation in other datasets. Hence, I created ClothingGAN an application where you can collaboratively design clothes with AI without high technical expertise.

How I Built It

The first step is to have a generative model that can generate clothing. I didn’t manage to find a public model that can generate a decent quality image, hence I decided to train my own GAN clothing model with StyleGAN. Then I used GANSpace, a latent-space-based semantic editing method, to provide the editing capabilities. It finds important directions in the GAN latent space that may represent certain visual attributes which I then manually label the attributes. Finally, I built the demo interface using Gradio library and deployed it to HuggingFace Space.

Training StyleGAN model

I used the StyleGAN2-ADA[2] model as at the time of the project, the latest StyleGAN model is the StyleGAN2-ADA model. However, you may want to use the current latest version which is StyleGAN3. Although I’m not sure how compatible StyleGAN3 is with my methods or with other libraries that I’m using.

To train the model, I used the clothing dataset created by Donggeun Yoo in PixelDTGAN [1] paper. The dataset has 84,748 images comprised of 9,732 upper clothing images with clean backgrounds that are associated with the rest of 75,016 fashion model images. I only used the clothing images with images clean background. Therefore, the total images used to train the StyleGAN model are around 9k images with a resolution of 512×512. Here is the link to the dataset which is shared on the author’s website. The PixelDTGAN paper is under the MIT license.

A peek of the LookBook dataset [Image by Author, Dataset by PixelDTGAN[1]]

I will not discuss the exact steps on how to train the model as I have already written an article on this topic before. Just follow the same step with the selected dataset.

Here is the result after the training.

Samples and interpolation of the generated designs by the trained model [Image by Author]

Semantic Editing with GANSpace

Semantic image editing is the task of modifying semantic attributes such as style or structure in a given source image. For example, modifying the hair color of a person while preserving the identity of the person. Applications of image editing range widely from photo enhancement, style manipulation for artistic and design purpose, to data augmentation. Semantic image editing commonly has two goals: allowing continuous manipulation of multiple attributes simultaneously and preserving the source image’s identity as much as possible while maintaining the realism of the image.

Existing methods of semantic image editing using GANs can be mainly categorized into either image-space editing or latent-space editing. Image-space editing learns a network that directly transforms a source image into another image in the target domain. These approaches usually only allow binary attribute change rather than allowing continuous changes. Examples of these approaches are pix2pix, StarGAN, and DRIT++.

On the contrary, latent-space editing indirectly manipulates the images by manipulating the input vector across the latent space of GAN models. These approaches mainly focus on finding paths in the latent space that represent semantic attributes of the generated images. Navigating the input vectors in these paths allow continuous editing of the attributes.

Unsupervised, self-supervised, and supervised approaches to latent-space editing have all been proposed. GANSpace[3] uses Principal Component Analysis (PCA) in either the latent or feature space to find important directions in an unsupervised manner. The important direction can also be found similarly using closed-form factorization (from SeFa paper). Self-supervised approaches are also able to find these directions without labels as they generate their own labels but are often limited to geometric attributes such as rotation or scale. On the other hand, a supervised approach such as InterfaceGAN requires label information or an attributes classifier for their method.

GANSpace[3] discusses the usage of pre-trained GAN models in styling the generated images. A GAN model learns the function that maps a noise distribution z into an image distribution. Hence, given a different noise input z, the generated output will be different. However, deep learning model is often a black box and it is not known explicitly the relationship between the noise input and the generated output, hence the output cannot be explicitly controlled. The GAN model however can be conditioned to generate a specific class output given a class label as researched in conditional GAN. However, the label information of the dataset would be required to condition a GAN model during training, which may not be feasible for certain cases.

On the other hand, GANSpace’s [3] paper proposed that certain important directions can be found in the z latent space that represents known semantic concepts in the generated output such as the style of the output. To find this direction, the activations in the intermediate layers are observed for several samples and the PCA direction v is computed from the values in the intermediate network activation space. Then, the direction v will be transferred to find the correspondence direction u in the z latent space. The overall process is illustrated in the image below taken from the GANSpace paper.

2D Illustration of identifying PCA direction in GAN latent space [Source: GANSpace paper[3]]

The important direction u can be computed in different layers and direction in each layer may represent different semantic concepts. Directions found in early layers often represent high-level features such as the cloth structure, while directions found in the last few layers often represent low-level features such as lighting or colors. By manipulating the noise input z in these known directions, we can manipulate the generated output to the desired feature. The image below shows the manipulated result when GANSpace method is applied in different GAN models.

GANSpace results in different models. [Source: GANSpace paper[3]]

Finding the directions in the trained model

The code that will be shown here is tested on Google Colab, you can follow along with my notebook or on your own environment, but if you are following along outside of Colab environment, make sure your environment has the dependencies that are preinstalled in Colab.

Here is the tutorial notebook if you want to follow along

First, we will need to install the dependencies required for GANSpace.

!pip install ninja gradio fbpca boto3 requests==2.23.0 urllib3==1.25.11`

Restart the runtime after running the code, then clone the GANSpace repo.

!git clone https://github.com/mfrashad/ClothingGAN.git%cd ClothingGAN/

Run the following code for further setup. Make sure you are in the GANSpace folder.

!git submodule update --init --recursive!python -c "import nltk; nltk.download('wordnet')"

Next, we will have to modify the GANSpace code to add our custom model. For StyleGAN2, we need the PyTorch version of the model file. Since our StyleGAN model file is in Tensorflow .pkl format, we need to use the converter made by rosinality to change it to pytorch format .pt file. Just follow the steps in this notebook. (The project was done before the official StyleGAN2 PyTorch version was implemented, you may skip this part if your model file is is already in .pt or Pytorch format).

Next, go back to the GANspace folder and modify the models/wrappers.py to add our model file. First, go to the StyleGAN2 class and add our model name and output resolution in the config variable.

I added the ‘lookbook’ model in the config variable with 512×512 resolution at line 117 in models/wrappers.py [Image by Author]

Next, scroll down a bit further and add the link to the model in the checkpoints variable. To generate the link to our model, simply upload the model file to Google drive and use this site to generate a direct link to it.

I added a new generator model ‘lookbook’ at line 149 in models/wrappers.py file [Image by Author]

After you added the model into the file. Run the visualize.pyc script to do PCA and visualize the visual change when moving the input in the direction of the computed principal components.

Command used to do PCA and visualize the changes [Image by Author]

the --use_w option means we will manipulate the intermediate latent code w instead of the original latent codez in StyleGAN. The num_components is to specify on how many directions or principal components you want to keep. The maximum components would be 512 or the inputz or wdimension. The --video the option is to generate a video of the visual change when moving in the direction of the principal components instead of just generating the images. The script can take around 30 minutes to finish.

Once it is finished, it will generate the visualized change in the out folder. In my case it’s under the out/StyleGAN2-lookbook folder.

The generated visualization output [Image by Author]

We will take a look at the style/ipca/summ/components_W.jpg as it visualize the first 14 principal components.

Visualization of the first 14 principal components [Image by Author]

From the image above, we can start choosing the principal components we want to put in our demo and labelling them. For example, in my opinion C0 can be labeled as sleeve length, C1 as jacket, C2 and C3 as coat, C4 and C5 as brightness of the clothing, and C6 as shorter clothing.

You can also see the visualization with different samples in the additional files sampX_real_W.jpg to ensure that the changes caused by the principal components are consistent across different samples. There are 9 additional samples that is generated by the visualize.py script.

Here is the visualization with another sample.

Visualization of the principal components with a different sample [Image by Author]

You can see that the changes are roughly consistent even with the different samples (C0 as sleeve length, C1 as jacket, etc).

Additionally, you can also see the visualization of each component as a video in the comp or inst folder. The principal components themselves are saved in the cache/components/ folder with the format of .npz

Location of the computed principal components file [Image by Author]

Once we have the components, we can start building the demo UI.

Building UI with Gradio

Gradio is a python library that makes building UI/demo for ML projects extremely easy in just a few lines of code. Here is an example of how easy Gradio is:

Example code and the resulting application using Gradio [Image by Author]

Gradio is suitable when you want to demonstrate your ML application as just a single function.

First of all, we need to load the generator model and also the principal components into memory.

Then, we will define a utility function to manipulate the w input in the specified direction and generate the image using the generator.

Finally, we can define the main function generate_imageand build the UI for the function using the Gradio library.

And here is the result!

Demo UI with Gradio [Image by Author]

Gradio will also give a link that you can share where your friends or anyone else can try the demo. However, the hosting of the demo is not permanent and Colab server is limited to 12 or 24 hours before it will terminate itself. For permanent hosting, you can simply run the code in a cloud or your own server. But fortunately, Hugging Face made Spaces, a platform where you can simply upload your ML app and host it permanently for free (need to pay if you want a GPU). In addition, it integrates with Gradio and Streamlit nicely and works out of the box.

Deploying to HuggingFace Space

First, head to Spaces and log in/register to your account. Then click ‘Create new Space’.

Hugging Face Space main page [Image by Author]

Then, choose a name and license that you want and choose Gradio as the space SDK.

Creating a New Space [Image by Author]

Next, clone the hugging face repo.

[Image by Author]

Personally, I had an authentication issue when pushing to the repo and I had to use a token as an authentication method by setting the remote URL to:

https://GITHUB_TOKEN@huggingface.co/spaces/mfrashad/Test

or you can also use the username and password in the URL for authentication.

https://GITHUB_USERNAME:PASSWORD@huggingface.co/spaces/mfrashad/Test

Once, we cloned we can start creating the file needed for our demo. There are 3 important files that need to be in the Spaces repo: the requirements.txt to specify all the python dependencies to be installed with pip install, the packages.txt to specify the dependencies to be installed with apt install , and the main python file app.py that should contain your Gradio demo code.

Additionally, you need to use git-lfs to upload any binary files to the repo, e.g images.

So what I did is simply copy all our code in Colab to the Spaces repo. Removes the images, binary files, and files that are not needed for the demo. Put all the python code in our notebook into a python file app.py. Then create the requirements.txt and the packages.txt . Once that is done, simply git push and voila! The demo will be available in Hugging Face Space for anyone to try (assuming you don’t get any error).

For full content of the code, you can check the files in the ClothingGAN Space repo.

Your Space demo will show up on your profile and Spaces main page [Image by Author]

Congratulations! You manage to read all the way to this point and hopefully managed to do everything. For more challenges, you can try training your own StyleGAN model and apply semantic editing as well. For example, I’ve also applied the same method for character and fashion model generation.

Character generation and semantic editing [Image by Author]

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: