W&B Fastbook Sessions Week 4 Summary

Original Source Here

Ravi Mashru’s Suggestion

Image by Vinayak

There was a very nice observation made by Ravi Mashru. If we observe a three as typed on a computer/keyboard, it is indeed the case that a 3 would be symmetric along the horizontal axis and a 7 wouldn’t.

Imagine if you had a three on a piece of paper very nicely written in a box, then if you fold this paper along the horizontal axis, the two curves in the digit three would completely overlap over one another. However if you do the same with the number seven, you would get a grad sign or an inverted triangle symbol.

If we use this concept to take an image, break it in two halves, flip the bottom half and then take an element wise subtraction of all the boxes in the image and reduce this difference tensor down to a single element, this quantity should be very low for a three and very high for a seven. i.e. we can map an image to a scalar estimate and study the distribution of this estimate for the number three and for the number seven to come up with a univariate classifier model or a simple histogram.

There’s so much to unpack in the above few lines; let’s approach everything step by step:

  1. Breaking the image into halves

In the MNIST dataset, every image is a tensor of dimension 28 X 28. We want to cut an image into half horizontally. So the top half should contain 14 rows and the bottom half should contain 14 rows. Since we’re not cutting vertically, we want all the columns to be intact. We could use slicing to accomplish this separation of an image into two components as follows

# Open the image and convert it into a tensorimg = tensorify_image(img_pth)# Slice the image into top and bottomtop = img[:14, :]bottom = img[14:, :]

2. Flipping the bottom image

Once we’ve gotten the two halves of image separately, we need to flip the bottom half so that we can compute an elementwise difference between the two to get our univariate independant variable. torch provides a method called flip(tensor, dimensions) which flips a tensor/matrix along the given dimensions i.e. width/height in our case.

Since we want to flip along the horizontal axis which is the zeroth axis, we can do it as follows

bottom = img[14:, :]flipped_bottom = torch.flip(bottom, [0])

3. Visualize the components

Now, we can have a look at the top, bottom, flipped_bottom images separately and also overlay the bottom part on the top to see the difference between a three image and a seven image. This we can do using matplotlib as follows.

Using this function if we randomly plot a three and a seven, they look as follows

Image by Vinayak

As we can see the components of threes tend to overlap over one another in the same positions but the components of seven have little to no overlap. We can measure the extent of this overlap by computing a box to box distance between the corresponding boxes of both the top half and the bottom flipped half images.

4. Distance Computation

We will take the l2_norm i.e. squared distance computed elementwise for every box in the top and bottom_flipped image components for every image. The distance computation is straightforward. We difference the tensors, square them and take a mean across all the dimensions and take a square root of the same. Finally we retrieve a scalar from this aggregated result using the .item() method.

Distance computation

After doing this for both train_3s and train_7s, we can compute a distribution of this variable for both of them and we obtain something as follows

Image by Vinayak

Whoa! This is unbelievable right? Both the distributions almost overlap each other. How in the world could this happen? What can we do to better this? As a last try, we can try binarization on our training data and do a similarity comparison instead of a distance comparison.

Binarization & Similarity Computation

Since we’re typically interested in regions with high illumination and not interested in blank regions of the image, with the help of torch.where which we used in the surface area illumination idea, we can binarize our image. We could replace all boxes which are illuminated by a single illumination of 255 and all others to 0s.

Then in the flipped and top halves of the images, we can compute the similarity in regions i.e. we will count the number of boxes which are illuminated in the same box location for both the halves and that fraction for 3 should hopefully be bigger than for 7. So, let’s get started.

First, let’s visualize the binarized images to understand how they’re different from the original ones

Image by Vinayak

As you can see in the original there’s a different illumination associated with each number in a box, however in the binarized version it’s all the same i.e. uniform illumination throughout the number.

It is extremely simple to binarize an image

Simply use the torch.where function to check the illumination and then fill in the new illuminations accordingly.

# Read the image as a tensorimg = tensorify_image(img_pth)# If binarize, we shall convert the image to a binary imageif binarize: img = torch.where(img > 0, 255, 0)

However, there will be a difference in how we compute the univariate representation of this image. It will be as follows

Binarized Similarity Computation

The computation of similarity is involved. We first add the two binarized sections. Since each box is a 0/1, the resulting tensor addition can have a 0 (0,0), a 1 (1, 0)/(0, 1) or a 2(1, 1). We are interested to count those boxes where both the boxes are illuminated i.e. 2; hence we again use torch.where on the combined image and do a sum of all those boxes where the match happens based on illumination. Eventually since we’ve reduced it to a single item, we retrieve a python scalar by using the .item() method.

If we do a similar exercise as the above two cases and plot a distribution of the two classes, we get something as follows

Image by Vinayak

As we hypothesized, the mean of threes is higher than that of sevens which means our intuition was right. However, threes also has a larger standard deviation and has a flatter distribution. This means that the threes’ images we’re encountering are more diverse in nature than the sevens’ images.

If we adopt a similar prediction logic as the one described in surface area illumination computation, we get accuracy of

  1. 26.64% on the first variant of this horizontal symmetric insight.
  2. 34.79% on the second (binarized) variant of the horizontal symmetric insight.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: