Original Source Here
Autoencoders are used as an unsupervised deep learning technique for learning data encodings. They work by learning a representation from the given unlabelled data and reconstruct the data from that representation as accurate as possible. The learned representation state is also called latent space or code. Autoencoders work in two different parts. Encoders (first part) is used to learn the important and representative features of given image and represent them into latent space. Decoder (second part) is used to reconstruct the image from the latent space by removing the noise and unimportant features from the image. This will result in the compressed image. The compression might be lossy as some features are lost in the compression and the resulting images can be blurry.
Autoencoders can be used in image de-noising, information retrieval, data compression, image colorization, and image enhancement kind of applications.
Following picture represents the high architecture for the working of Autoencoders.
It is necessary that the number of nodes in both input and output layers are same for the reconstruction purpose. General model consists of one input, one output and one hidden layer but having multiple hidden layers make them deep autoencoders.
The data is given to the encoder through input layer. After passing through the hidden layers, the data is compressed into a code/latent space where similar data points are closer to one another. Then latent state is passed to decoder where the necessary patterns and features of the data are picked up and re-converted into the original image.
Mathematical Understanding of Autoencoders
Following image shows the mathematical representation. Here W and V represents the weights for encoder and decoder part respectively. Z is latent space obtained after taking product of encoder weights and input and then passing through function. Afterwards, in decoder, the latent space vector is multiplied with decoder weights to obtain the reconstructed image. Mean Square Error is used to find the reconstruction loss. This loss is propagated back while training the model for the accurate results.
Facial Images Reconstruction example
Let’s build an Autoencoder using face images and reconstruct them as accurately as possible. Here we will take images from Flickr-Faces-HQ Dataset (FFHQ) dataset and remove some portion of images by intentionally placing black boxes on them which will act as a noise. The original dataset has images of size 1024 by 1024 but we have only taken 128 by 128 images. Our Autoencoder will try to reconstruct the missing parts of the images.
Step1: Importing Libraries…
Step2: Data Generation and Preprocessing…
The dataset is quite large (70000 images) which makes it impossible to load it all at the same time in computer memory. Therefore, we will implement a custom generator function to load the images in batches. Instead of returning the images and its labels, here we will return tuple (corrupted_images_batch, original_images_batch) from the generator where the corrupted images are the same images as the original but a small square is removed from them.
Now we will create a function to remove a portion of image by drawing a black square box on it. The square drawn here is 28×28 and it is drawn at any random location with-in the image.
We are using ImageDraw function of PIL library to generate the box.
Now, we will generate batches of data which will be used for training.
After processing the data and getting images in batches ,our corrupted data will look something like this.
Step3: Model Building…
Moving forward, we will build our model by first creating the architectures for encoders and decoders. Afterwards, we will merge both of them to get the full fledge Autoencoder.
The architecture of the Encoder consists of a stack of convolutional layers followed by a dense (fully connected) layer which outputs a vector of size Z_DIM(latent space dimension). The whole image of size 128x128x3 is decdoed into this latent space vector of size Z_DIM.
You can refer to this link to see how to create Autoencoder model in Keras.
Decoder Model is usually the opposite to that of Encoder but not mandatory. Since Encoder use Convolutional layers to decompress the image. For its reverse effect in Decoder we will use Conv2DTransponse layer. This layer produces an output tensor double the size of the input tensor in both height and width. The input to the decoder is the vector of size Z_DIM and output will be a image of size INPUT_DIM (128x128x3).
Unlike the encoder, there will the activation function for the decoder, as it will be outputting the image. And we want our pixel values between zero and one. Here we are using sigmoid as our activation function.
Merging Encoder and Decoder
As final step, we will attach encoder to our decoder to get the final version of our Autoencoder. Code is as follows:
Step 4: Training the Model…
Almost done! Now we are left with training part. We can change various parameters and find the accurate results. Parameters list include:
- Learning Rate
- Training Epochs
- Batch size
- Latent vector size
- Error function
I have trained the model with the learning rate = [0.01, 0.001], optimizers = [Adam, SGD], Loss = mse, Batch size= 64, Epochs = 15, Latent space size = 300
Training code is as follows.
STEP 5: Results…
At last, time for results! My training model took around 10–12 hours for each parameter discussed above in Google Colabs. The parameters giving the best results with training loss: 0.0111 and accuracy: 85.10% are as follows:
Latent space: 300, Epochs: 50, Learning rate: 0.001, Batch size: 64, Optimizer: Adam, Loss: mse
The images I got as a result were blurry. That was because of the Loss function MSE which averages out the pixels values and results in blurriness.
Some other results which I got with different combination of parameters are as follows:
Parameters: epoch:15, learning rate:0.01, batch size:64, optimizer:Adam, loss:mseResults: loss_value: 0.0121 - accuracy: 0.8392
Parameters: epoch:15, learning rate:0.01,batch size:64, optimizer:SGD, loss:mseResults: loss_value: 0.0118 - accuracy: 0.8433
Parameters:epoch:15, learning rate:0.001, batch size:64, optimizer:Adam, loss:mseResults: loss_value: 0.0111 - accuracy: 0.8475
Parameters:epoch:15, learning rate:0.001, batch size:64, optimizer:SGD, loss:mseResults: loss_value: 0.0111 - accuracy: 0.8469
Parameters:epoch:30, learning rate:0.0001, batch size:32, optimizer:Adam, loss:mseResults: loss_value: 0.0125 - accuracy: 0.8357
Parameters:epoch:10, learning rate:0.01, batch size:64, optimizer:RMSprop, loss:binary_crossentropyResults: loss_value: 0.5712 - accuracy: 0.8022
Step 6: Conclusion…
Autoencoders are mainly used for denoising, reconstruction of images and anomaly detection but they have some disadvantages as well. They are not much effective in generating images as the resulting images get blurry. The biggest reason for their inefficiency is that the latent space of Autoencoders is not continuous but they only learn single latent representation of the data.
Some of the references which I used while writing this article are listed as follows:
That was all about Autoencoders. If you find it useful and interesting or want to suggest some feedback, feel free to email me at firstname.lastname@example.org. You can find the whole code on Google Colab here. I’ll try to write another article on Variational Autoencoder(more advanced form of Autoencoders) and its comparison with Autoencoder very soon.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot