Original Source Here
How to Build an AI Fashion Designer
Clothing semantic editing for fashion design using StyleGAN and GANSpace
This is a write-up for my old project ClothingGAN. The project generates clothing design with AI using StyleGAN and semantically edits it with attributes such as sleeve, size, dress, jacket, etc. You can also do style transfer as shown in the image above by first generating 2 different clothing designs (output 1) with different seed numbers. Then, it will generate a third design (output 2) that mixes the previous 2 designs. You can then adjust how much style or structure you want it to inherit from the two original designs.
- How I Built It
- Training StyleGAN model
- Semantic Editing with GANSpace
- Building UI with Gradio
- Deploying to HuggingFace Space
GAN or Generative Adversarial Network is a generative model that is able to generate images by learning the probability distribution of a large image dataset. I always find GANs fascinating as it enables me to generate high-quality arts or design even without the technical or artistic skill in drawing. Recently, I’ve seen many face editing demonstrations on GAN, but have rarely seen semantic manipulation in other datasets. Hence, I created ClothingGAN an application where you can collaboratively design clothes with AI without high technical expertise.
How I Built It
The first step is to have a generative model that can generate clothing. I didn’t manage to find a public model that can generate a decent quality image, hence I decided to train my own GAN clothing model with StyleGAN. Then I used GANSpace, a latent-space-based semantic editing method, to provide the editing capabilities. It finds important directions in the GAN latent space that may represent certain visual attributes which I then manually label the attributes. Finally, I built the demo interface using Gradio library and deployed it to HuggingFace Space.
Training StyleGAN model
I used the StyleGAN2-ADA model as at the time of the project, the latest StyleGAN model is the StyleGAN2-ADA model. However, you may want to use the current latest version which is StyleGAN3. Although I’m not sure how compatible StyleGAN3 is with my methods or with other libraries that I’m using.
To train the model, I used the clothing dataset created by Donggeun Yoo in PixelDTGAN  paper. The dataset has 84,748 images comprised of 9,732 upper clothing images with clean backgrounds that are associated with the rest of 75,016 fashion model images. I only used the clothing images with images clean background. Therefore, the total images used to train the StyleGAN model are around 9k images with a resolution of 512×512. Here is the link to the dataset which is shared on the author’s website. The PixelDTGAN paper is under the MIT license.
I will not discuss the exact steps on how to train the model as I have already written an article on this topic before. Just follow the same step with the selected dataset.
Here is the result after the training.
Semantic Editing with GANSpace
Semantic image editing is the task of modifying semantic attributes such as style or structure in a given source image. For example, modifying the hair color of a person while preserving the identity of the person. Applications of image editing range widely from photo enhancement, style manipulation for artistic and design purpose, to data augmentation. Semantic image editing commonly has two goals: allowing continuous manipulation of multiple attributes simultaneously and preserving the source image’s identity as much as possible while maintaining the realism of the image.
Existing methods of semantic image editing using GANs can be mainly categorized into either image-space editing or latent-space editing. Image-space editing learns a network that directly transforms a source image into another image in the target domain. These approaches usually only allow binary attribute change rather than allowing continuous changes. Examples of these approaches are pix2pix, StarGAN, and DRIT++.
On the contrary, latent-space editing indirectly manipulates the images by manipulating the input vector across the latent space of GAN models. These approaches mainly focus on finding paths in the latent space that represent semantic attributes of the generated images. Navigating the input vectors in these paths allow continuous editing of the attributes.
Unsupervised, self-supervised, and supervised approaches to latent-space editing have all been proposed. GANSpace uses Principal Component Analysis (PCA) in either the latent or feature space to find important directions in an unsupervised manner. The important direction can also be found similarly using closed-form factorization (from SeFa paper). Self-supervised approaches are also able to find these directions without labels as they generate their own labels but are often limited to geometric attributes such as rotation or scale. On the other hand, a supervised approach such as InterfaceGAN requires label information or an attributes classifier for their method.
GANSpace discusses the usage of pre-trained GAN models in styling the generated images. A GAN model learns the function that maps a noise distribution z into an image distribution. Hence, given a different noise input z, the generated output will be different. However, deep learning model is often a black box and it is not known explicitly the relationship between the noise input and the generated output, hence the output cannot be explicitly controlled. The GAN model however can be conditioned to generate a specific class output given a class label as researched in conditional GAN. However, the label information of the dataset would be required to condition a GAN model during training, which may not be feasible for certain cases.
On the other hand, GANSpace’s  paper proposed that certain important directions can be found in the z latent space that represents known semantic concepts in the generated output such as the style of the output. To find this direction, the activations in the intermediate layers are observed for several samples and the PCA direction v is computed from the values in the intermediate network activation space. Then, the direction v will be transferred to find the correspondence direction u in the z latent space. The overall process is illustrated in the image below taken from the GANSpace paper.
The important direction u can be computed in different layers and direction in each layer may represent different semantic concepts. Directions found in early layers often represent high-level features such as the cloth structure, while directions found in the last few layers often represent low-level features such as lighting or colors. By manipulating the noise input z in these known directions, we can manipulate the generated output to the desired feature. The image below shows the manipulated result when GANSpace method is applied in different GAN models.
Finding the directions in the trained model
The code that will be shown here is tested on Google Colab, you can follow along with my notebook or on your own environment, but if you are following along outside of Colab environment, make sure your environment has the dependencies that are preinstalled in Colab.
Here is the tutorial notebook if you want to follow along
First, we will need to install the dependencies required for GANSpace.
!pip install ninja gradio fbpca boto3 requests==2.23.0 urllib3==1.25.11`
Restart the runtime after running the code, then clone the GANSpace repo.
!git clone https://github.com/mfrashad/ClothingGAN.git%cd ClothingGAN/
Run the following code for further setup. Make sure you are in the GANSpace folder.
!git submodule update --init --recursive!python -c "import nltk; nltk.download('wordnet')"
Next, we will have to modify the GANSpace code to add our custom model. For StyleGAN2, we need the PyTorch version of the model file. Since our StyleGAN model file is in Tensorflow .pkl format, we need to use the converter made by rosinality to change it to pytorch format .pt file. Just follow the steps in this notebook. (The project was done before the official StyleGAN2 PyTorch version was implemented, you may skip this part if your model file is is already in .pt or Pytorch format).
Next, go back to the GANspace folder and modify the
models/wrappers.py to add our model file. First, go to the StyleGAN2 class and add our model name and output resolution in the
Next, scroll down a bit further and add the link to the model in the
checkpoints variable. To generate the link to our model, simply upload the model file to Google drive and use this site to generate a direct link to it.
After you added the model into the file. Run the
visualize.pyc script to do PCA and visualize the visual change when moving the input in the direction of the computed principal components.
--use_w option means we will manipulate the intermediate latent code
w instead of the original latent code
z in StyleGAN. The
num_components is to specify on how many directions or principal components you want to keep. The maximum components would be 512 or the input
--video the option is to generate a video of the visual change when moving in the direction of the principal components instead of just generating the images. The script can take around 30 minutes to finish.
Once it is finished, it will generate the visualized change in the out folder. In my case it’s under the
We will take a look at the
style/ipca/summ/components_W.jpg as it visualize the first 14 principal components.
From the image above, we can start choosing the principal components we want to put in our demo and labelling them. For example, in my opinion C0 can be labeled as sleeve length, C1 as jacket, C2 and C3 as coat, C4 and C5 as brightness of the clothing, and C6 as shorter clothing.
You can also see the visualization with different samples in the additional files
sampX_real_W.jpg to ensure that the changes caused by the principal components are consistent across different samples. There are 9 additional samples that is generated by the
Here is the visualization with another sample.
You can see that the changes are roughly consistent even with the different samples (C0 as sleeve length, C1 as jacket, etc).
Additionally, you can also see the visualization of each component as a video in the
inst folder. The principal components themselves are saved in the
cache/components/ folder with the format of
Once we have the components, we can start building the demo UI.
Building UI with Gradio
Gradio is a python library that makes building UI/demo for ML projects extremely easy in just a few lines of code. Here is an example of how easy Gradio is:
Gradio is suitable when you want to demonstrate your ML application as just a single function.
First of all, we need to load the generator model and also the principal components into memory.
Then, we will define a utility function to manipulate the
w input in the specified direction and generate the image using the generator.
Finally, we can define the main function
generate_imageand build the UI for the function using the Gradio library.
And here is the result!
Gradio will also give a link that you can share where your friends or anyone else can try the demo. However, the hosting of the demo is not permanent and Colab server is limited to 12 or 24 hours before it will terminate itself. For permanent hosting, you can simply run the code in a cloud or your own server. But fortunately, Hugging Face made Spaces, a platform where you can simply upload your ML app and host it permanently for free (need to pay if you want a GPU). In addition, it integrates with Gradio and Streamlit nicely and works out of the box.
Deploying to HuggingFace Space
First, head to Spaces and log in/register to your account. Then click ‘Create new Space’.
Then, choose a name and license that you want and choose Gradio as the space SDK.
Next, clone the hugging face repo.
Personally, I had an authentication issue when pushing to the repo and I had to use a token as an authentication method by setting the remote URL to:
or you can also use the username and password in the URL for authentication.
Once, we cloned we can start creating the file needed for our demo. There are 3 important files that need to be in the Spaces repo: the
requirements.txt to specify all the python dependencies to be installed with
pip install, the
packages.txt to specify the dependencies to be installed with
apt install , and the main python file
app.py that should contain your Gradio demo code.
Additionally, you need to use git-lfs to upload any binary files to the repo, e.g images.
So what I did is simply copy all our code in Colab to the Spaces repo. Removes the images, binary files, and files that are not needed for the demo. Put all the python code in our notebook into a python file
app.py. Then create the
requirements.txt and the
packages.txt . Once that is done, simply
git push and voila! The demo will be available in Hugging Face Space for anyone to try (assuming you don’t get any error).
For full content of the code, you can check the files in the ClothingGAN Space repo.
Congratulations! You manage to read all the way to this point and hopefully managed to do everything. For more challenges, you can try training your own StyleGAN model and apply semantic editing as well. For example, I’ve also applied the same method for character and fashion model generation.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot