A Comprehensive Guide to Image Augmentation using Pytorch


Original Source Here

A Comprehensive Guide to Image Augmentation using Pytorch

A way to increase the amount of data and make the model more robust

Photo by Dan Gold on Unsplash

Lately, while working on my research project, I began to understand the importance of image augmentation techniques. The aim of this project is to train a robust generative model able to reconstruct the original images.

The problem addressed is anomaly detection, which is quite challenging since there is a small volume of data and, then, the model is not enough to do all the work alone. The common scenario is to train a two-network model with the normal images available for training and evaluate its performance on the test set, that contains both normal and anomalous images.

The initial hypothesis is that the generative model should capture well the normal distribution but at the same time, it should fail on reconstructing the abnormal samples. How is it possible to verify this hypothesis? We can look at the reconstruction error, which should be higher for abnormal images, while it should be low for the normal samples.

In this post, I am going to make a list of the best techniques that amplify the variety of images present in the dataset. We are going to explore simple transformations, like rotation, cropping and Gaussian filter, and more sophisticated techniques, such as Gaussian noise and random blocks.

Image Aumentation techniques:

1. Simple transformations

  • Resize
  • Gray Scale
  • Normalize
  • Random Rotation
  • Center Crop
  • Random Crop
  • Gaussian Blur

2. More advanced techniques

  • Gaussian Noise
  • Random Blocks
  • Central Region

1. Introduction to Surface Crack dataset

Surface Crack Dataset. Illustration by Author.

In this tutorial, we are going to use the Surface Crack Detection Dataset. You download the dataset here or on Kaggle. As you can deduce from the name, it provides images of surfaces with and without cracks. So, it can be used as dataset for the task of anomaly detection, where the anomalous class is represented by the images with cracks, while the normal one is indicated by the surfaces without cracks. It contains 4000 color images of surfaces with and without defects. Both the classes are available in both training and test sets. Moreover, each dataset image is acquired at a resolution of 227 by 227 pixels.

2. Simple transformations

This section includes the different transformations available in the torchvision.transforms module. Before going deeper, we import the modules and an image without defects from the training dataset.

Let’s display the dimension of the image:

np.asarray(orig_img).shape  #(227, 227, 3)

It means that we have a 227×227 image with 3 channels.


Since the images have very high height and width, there is the need to reduce the dimension before passing it to a neural network. For example, we can resize the 227×227 image into 32×32 and 128×128 images.

Resized Images. Illustration by Author

It’s worth noticing that we lose resolution when we obtain a 32×32 image, while a 128×128 dimension seems to maintain the high resolution of the sample.

Gray Scale

RGB images can be challenging to manage. So, it can be useful to convert an image to greyscale:

Original image vs Grayscale image. Illustration by Author


The normalization can constitute an effective way to speed up the computations in the model based on neural network architecture and learn faster. There are two steps to normalize the images:

  • we subtract the channel mean from each input channel
  • later, we divide it by the channel standard deviation.

We can display the original image together with its normalized version:

Original Image vs Normalized Image. Illustration by Author

Random Rotation

T.RandomRotation method rotates the image with random angles.

Different rotated images. Illustration by Author

Center Crop

We crop the central portion of the image using T.CenterCrop method, where the crop size needs to be specified.

Center crop. Illustration by Author

This transformation can be useful when the image has a big background in the borders that isn’t necessary at all for the classification task.

Random Crop

Instead of cropping the central part of the image, we crop randomly a portion of the image through T.RandomCrop method, which takes in the output size of the crop as parameter.

Random Crop. Illustration by Author

Gaussian Blur

We apply a Gaussian blur transform to the image using a Gaussian kernel. This method can be helpful in making the image less clear and distinct and, then, this resulting image is fed into a neural network, which becomes more robust in learning patterns of the samples.

Gaussian Blur. Illustration by Author

3. More advanced techniques

Previously examples with simple transformations provided by PyTorch were shown. Now we’ll focus on more sophisticated techniques implemented from scratch.

Gaussian Noise

The Gaussian Noise is a popular way to add noise to the images, forcing the model to learn the most important information contained in the data. It applies a Gaussian Noise matrix, which is essentially a Gaussian matrix of mean 0 and variance 1. Later, it clips the samples between 0 and 1. The more the noise factor is higher, the more noisy the image is.

Gaussian Noise. Illustration by Author

Random Blocks

Square patches are applied as masks in the image randomly. The higher the number of these patches, the more the neural network will find challenging the problem to solve.

Random Blocks. Illustration by Author.

Central Region

It’s a very simple technique to make the model generalize more. It consists of adding a patch block in the central region of the image.

Central Regione. Illustration by Author.

Final thoughts:

I hope you found useful this tutorial. The intention was to make an overview of the image augmentation approaches to solve the generalization problem of the models based on neural networks. Feel free to comment if you know other effective techniques. I am going to explain how to exploit these techniques with autoencoders in the next post. Thanks for reading. Have a nice day!

Disclaimer: This data set is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) by Çağlar Fırat Özgenel.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: