Augmenting Images for Deep Learning



Original Source Here

Augmenting Images for Deep Learning

(source: author)

Data collection can be time-consuming, expensive and, honestly, boring. When our ability to collect data is limited, data augmentation can play an important role. It can help us to build a more robust dataset, reduce overfitting and increase the amount of training data.

We will discuss data augmentation and its benefits. We will also keep things practical. That is by walking through Python code used to augment images. Specifically, we will discuss how to do these augmentations:

  • Flipping images
  • Adjusting brightness
  • Random colour jitter
  • Random noise (Gaussian, salt and pepper, and deletion)

To end, we will discuss best practices when it comes to augmenting images. Specifically, how to best validate our model and test it in production.

Use case — automated car

To keep things interesting, we’ll be augmenting images for an automated car. You can see what we mean in Figure 1. There is a camera on the front of the car and a model uses the images to make predictions. The predictions are then used to direct the car.

Figure 1: automate car with camera sensor (source: author)

The goal is for the car to go around the track while staying within the orange lines. Along the way, we will discuss which augmentations make sense for this particular application. This is to drive the point that choices around data augmentation require some critical thinking.

Collecting a robust dataset

Before we get into that, it is worth discussing data collection. This is because data augmentation is an alternative or supplementary approach to collecting a robust dataset.

What is a robust dataset?

A robust dataset is one that reflects all the conditions under which a model is expected to perform. The conditions are determined by variables such as lighting conditions, the angle of a camera, the colour of a room, or objects in the background. Training on such a dataset will produce a model that is resilient to changes in these variables.

Robust dataset = robust model

A good example comes from our experience with the automated car. We collected data, trained a model, and deployed it. It worked perfectly! That is until we opened up the blinds…

Figure 2: model struggling to perform under different conditions

The sun rays reflected off the track and “confused” the model. It was not able to make accurate predictions under these conditions. In other words, the model was not robust to changes in the lighting conditions.

How to collect a robust dataset

Building a robust dataset starts with a good data collection strategy. You need to think about all the variables that will impact the conditions. Then you need to collect data that captures variation in those variables. For example, for different lighting conditions we can:

  • Turn lights on and off
  • Open and close window blinds
  • Collect data during different times of the day

Other variables are different aspects of the image background. This includes the colour of walls and carpets and the different objects in the background. To account for these we could:

  • Collect data in different rooms
  • Move different objects into the background

With these changes, what we are doing is adding noise to our dataset. The hope is the model will learn to ignore this noise and only use the track to make predictions. In other words, we want the model to use true causes and not associations.

The benefits of data augmentation

Data augmentation is when we systematically or randomly alter images using code. This allows us to artificially introduce noise and increase the size of our dataset. Really, the aim is the same as data collection and it follows that the benefits are similar.

Building a robust dataset

Often we are limited by how much data can be collected. In this case, data augmentation can help improve the robustness of our dataset. Even if you’ve managed to collect a lot of data, augmentation can provide an additional layer of robustness.

Although, to do this we need to think critically about the types of augmentations. That is they should simulate conditions we expect to see in the real world. For example, later we will see how to adjust the brightness of images to simulate different lighting conditions.

Reduce overfitting to a set of conditions

With good augmentations, we can reduce overfitting. To be clear, this is not the same as overfitting to a training set. Take Figure 3 for example. Suppose, we only collected data in one room. As a result, the model associated an object in the background with the prediction to turn left.

Figure 3: overfitting to objects in a room (source: author)

The object will be in all the images we collected. This means it will be in the training, validation, and even the test set. The model can perform well on all of these sets but still perform badly in production. For example, if we removed the object it may become “confused” and not turn left. In other words, the model overfitted to the conditions reflected in our dataset.

Data augmentation can help with this type of overfitting. Later, we will see how deleting pixels could help with the example above. That is we can artificially remove objects from the background.

Model convergence

We can augment the same image in multiple different ways. This can artificially inflate the size of our dataset. Considering that deep learning needs large datasets, this can help with the convergence of model parameters.

Augmenting Data with Python

Okay, with all that in mind, let’s move on to actually augmenting data. We’ll go over the code and you can also find the project on GitHub.

To start, we’ll use the imports below. We have some standard packages (lines 2–3). Glob is used to handle file paths (line 5). We also have some packages used to work with images (lines 8–11).

#Imports 
import numpy as np
import matplotlib.pyplot as plt

import glob
import random

import torchvision.transforms as transforms
import matplotlib.image as mpimg
from PIL import Image, ImageEnhance
import cv2

As mentioned, we’ll be augmenting images used to power an automated car. You can find examples of these on Kaggle. These images are all 224 x 224 pixels. We display one of them with the code below.

Take note of the image name (line 3). The first two numbers are x and y coordinates within the 224 x 224 frame. In Figure 4, you can see we have displayed these coordinates using a green circle (line 11).

read_path = "../../data/images/"

name = "32_50_c78164b4-40d2-11ed-a47b-a46bb6070c92.jpg"

#Get x,y coordinates from name
x = int(name.split("_")[0])
y = int(name.split("_")[1])

#Load image and add circle for coordinates
img = mpimg.imread(read_path + name)
cv2.circle(img, (x, y), 8, (0, 255, 0), 3)

plt.imshow(img)
Figure 4: example image (source: author)

These coordinates are the target variable. The model uses the images to try to predict them. This prediction is then used to direct the car. In this case, you can see the car is coming up to a left turn. The ideal direction is to go towards the coordinates given by the green circle.

Flipping images

Suppose we collected a bunch of images in the anti-clockwise direction (i.e left turns only). If we want the car to make right turns we’d have to collect a bunch more data. Alternatively, as our track is symmetrical, we could flip the images on the x-axis.

Figure 5: flipping a symmetrical track (source: author)

We do this using the flip_img function. Keep in mind that, when flipping on the horizontal axis, the x coordinate will also need to be adjusted. We do this in line 9 by subtracting the current coordinate from 224 (image width). You can see the result of this function in Figure 6.

def flip_img(name,img):
"""Invert image and target on x axis"""

# flip image
img = cv2.flip(img,1)

# flip target variable
s = name.split("_")
s[0] = str(224 - int(s[0]))
name = "_".join(s)

return name, img
Figure 6: horizontal flip (source: author)

Even if you have collected data in both directions it would make sense to flip the images. This allows us to double the size of the dataset. But what about vertical flipping?

For some applications, it may make sense. For our automated car… not so much. Take a look at Figure 7. Vertical flipping implies the car will be driving on the ceiling. Unless we’re driving in space, this is not a condition we would expect in production.

Figure 7: vertical flip (source: author)

Adjust brightness

With adjust_brightness, we can use the factor parameter to change the brightness of an image. Looking at Figure 8, if we increase the factor (1.5) the image will be brighter. Similarly, with a factor less than 1 the image will be darker.

def adjust_brightness(img,factor=1):
"""
Invert image on x axis
factor: <1 will decrease brightness and >1 will increase brightness
"""
img = Image.fromarray(img)

enhancer = ImageEnhance.Brightness(img)
img = enhancer.enhance(factor)

return img
Figure 8: adjusting brightness (source: author)

This function can help us simulate different lighting conditions. We can see how we could get similar results if we had turned the lights on and off during data collection.

Colour Jitter

We can take these types of augmentations further using the jitter function. This will randomly vary the brightness, contrast, saturation and hue of the image. Using the parameters, we can define the degree to which these aspects can vary. You can see some examples in Figure 9. These were created using the default parameter values.

def jitter(img, b=0.2, c=0.2, s=0.2, h=0.1):
"""
Randomly alter brightness, contrast, saturation, hue within given range
"""

img = Image.fromarray(img)

transform = transforms.ColorJitter(
brightness=b, contrast=c, saturation=s, hue=h)

# apply transform
img = transform(img)

return img
Figure 9: jitter augmentation (source: author)

Again, you need to think about whether these augmentations make sense for your application. You can see that by default we have set the hue factor to 0.1 (i.e, h=0.1). As seen in Figure 10, a higher hue factor would return images with different colour tracks. Yet, in production, our track will always be orange.

Figure 10: jitter with hue=0.5 (source: author)

We should also consider the limitations of these types of transformations. They adjust the color in the entire image. In reality, lighting conditions are more complicated. Sunlight can reflect off the track at different angles. Some parts of the track can be darker than others. If you really want to capture this noise you will have to do it with good data collection.

Input noise

A less systematic approach is to randomly introduce noise. You can see some examples in Figure 11. In each case, we are able to adjust the amount of noise introduced.

Figure 11: noise (source: author)

When doing these augmentation, remember that each pixel in our 224 x 224 image will have 3 channels — R, G, B. Each channel can take on a value between 0 and 255. These determine the colour of the pixel.

The first row in Figure 11 was created using the gaussian_noise function. We create a random noise array with the same dimension (224 x 224 x 3) as the image (lines 4–5). Each element in this array will be sampled from a normal distribution with mean 0 and the given variance (var). Adding this to the image will adjust the R, G, B channels by a random amount.

def gaussian_noise(img,var=5):
"""Add guassian noise to image"""

dims = np.shape(img)
noise = np.random.normal(0,var,size=dims).astype("uint8")

img = img + noise

return img

The sp_noise function works in a similar way. Except now we randomly change pixels, with the given probability (prob), to either black or white. You can see this in the second row in Figure 11.

def sp_noise(img,prob=0.1):
"""Add salt and pepper noise to image"""

height,width,channels = np.shape(img)
img = np.array(img)

#Iterate over all pixels
for i in range(height):
for j in range(width):
#Randomly change pixel values
if random.random()<prob:
if random.random() < 0.5:
img[i][j] = np.array([255,255,255]) #white
else:
img[i][j] = np.array([0,0,0]) #black

img = Image.fromarray(img)

return img

Gaussian and salt and pepper noise have the effect of reducing the quality of the images. In production, a model could be expected to make predictions using images of varying quality. These augmentations can help create a model that is robust against these changes.

The delete_square function is a different approach to adding noise. It works by deleting large chunks of the image. More specifically, it turns a random square with a given dimension (pixels) black. Examples are given in the last row of Figure 11.

def delete_square(img,pixels=20):
"""Delete random square from image"""

img = np.array(img)
h,w,channels = np.shape(img)

#Random starting pixel
rh = random.randint(0,h)
rw = random.randint(0,w)

sub = round(pixels/2)
add = pixels-sub

#Boundries for square
hmin = max(rh-sub,0)
hmax = min(rh+add,h-1)
vmin = max(rw-sub,0)
vmax = min(rw+add,w-1)

# Turn pixel within range black
img[hmin:hmax,vmin:vmax] = np.array([0,0,0])

img = Image.fromarray(img)
return img

Deletion can also help build a more robust model. When making predictions, a model may focus on a particular feature. For example, our model may only use the outer orange lane. Deleting parts of the image will force the model to use multiple features. As result, if something happens to one of the features, the model may still be able to make accurate predictions.

Although time-consuming, with deletion, you may want to take a more systematic approach. That is to exclude specific parts of the image. You can see this in Figure 12. Here we have deleted the chair from the background. This is so the model does not associate it with turning right.

Figure 12: systematic deletion (source: author)

Measuring the effect of augmentation

So we’ve seen different types of augmentations. We’ve also seen how we can vary the level of some of them. Really, the types and levels can be treated as a hyperparameter. When tunning these it is important to keep some things in mind.

Do not augment the test set

Suppose we augment the entire dataset and do a train and test split. This may lead to an overestimation of the model’s performance. This is because overfitting to the training set will not necessarily result in poor performance on the test set.

Take the brightness augmentation we saw in Figure 8. Some of these could end up in training and others in the test set. Now consider that the same thing will happen with the other augmentations. You can see how the test set would look very similar to the training set.

Figure 13: adjusting brightness (source: author)

In fact, it is best practice to not augment the test set at all. This is because the test set is used to estimate the model’s performance in production. Here, the model would not be expected to make predictions on augmented data.

All conditions are not reflected in the test set

At the same time, you need to consider that your test set is not robust. As a result, good test performance may not mean good performance in production. This is because your model may face conditions that are not captured in the test set. So, to really understand the impact of augmentations, we will need to test them in production.

This puts us in a tricky situation. Testing in production is time-consuming and you will not be able to test in all environments. This means it can be impossible to measure the effect of augmentations. Ultimately, you will need to think critically about what augmentations are right for your application. Domain knowledge and experience may be better indications of what will work.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: