Make Your Neural Net Confuse Dogs with Pelicans

Original Source Here

A few years ago, one of the first things I did when learning about neural networks is to train a simple image classifier. Neural nets can do a marvelous job at telling what’s in an image. However, one thing I have not asked myself back then is: “What are these nets actually learning?”

Let me explain what I mean by that with an example: How do we humans recognize that a dog is a dog? I’d say we look for distinctive features like pointy ears, snout, tail, four legs, and similar things. However, for neural networks, other things might be essential. It can be quite a mystery what they are looking for in for example a dog. This phenomenon becomes quite apparent with what I would call optical illusions for neural nets. The fascinating thing is that they are not illusions for humans.

The following image contains a beautiful german shepherd. A neural net correctly classifies it not only as a dog but even as the correct breed. Moreover, it is over 93 % sure of its decision, so everything is fine here.

Now comes the catch. Let’s take a look at the following picture:

The image is almost a perfect copy of the previous image. Spoiler: It is not! You have to look very closely to see a difference. The background is not as green, and the picture seems to be just a bit noisier. But it is almost unnoticeable for humans. I think we agree that every person accepting that the first image shows a german shepherd would say the same for this picture. So our neural net should also still state that it sees a german shepherd with about 93 % certainty, right? Wrong. Our neural net is now 100% sure that it sees a pelican:

Ok, so what is happening here? The classifier we used is a residual neural net (resnet) with 50 layers. It was pre-trained on the ImageNet dataset, a huge collection of images with 1000 different object classes. Next, we used a picture of a german shepherd that our resnet classifies correctly.

To create the optical illusion for our resnet, we take the original image and slightly change the pixels of the original image. We change just the right pixels, the pixels the resnet is most sensitive about with regard to our cute dog.

The following illustration shows what pixels we need to change by how much to fool our classifier. Because the pixel changes are so tiny, I scaled them by a factor of 10, so we can see them more clearly:

So we change the original image to be slightly more bluish and reddish. Also, some pixels are brightened a little bit. To our eyes, the changes are almost non-existent. But for our resnet … oh boy … it changes everything. It is now completely convinced to see a pelican. And the best part is, it doesn’t need to be a pelican. This trick works for basically any class. We can fool the neural net into thinking our dog is a broccoli, a printer, or anything else. Moreover, every decent image classifier is susceptible to these optical illusions or — how they are called in the literature — adversarial images.

Now, the question is, how do we find these pixel changes that confuse our classifier? How can we automatically create these optical illusions?

Automatically Fooling Neural Nets

The changes we made to our dog image were very tiny but also extremely specific. It is infeasible to find these perturbations manually. Luckily, there are various different techniques, also called attacks, to automatically create these optical illusions. One of these attacks is called Projected Gradient Descent (PGD). To understand PGD, we first need to quickly remind ourselves how neural networks learn by using gradient descent.

Gradient Descent

A neural net contains weights. Changing the weights influences the output of the neural net. To train a neural network we need to quantify how off its answers are from the correct ones. This is done by a loss function. Once a loss function is defined, we can compute the gradient of the loss function with respect to the network’s weights. Intuitively, the (negative) gradient tells us how to change the network’s weights such that the loss decreases as fast as possible, meaning its answers become more correct as fast as possible.

Illustration of gradient-descent-based learning. (Image by author)

Doing this over and over for all images in the training set leads to a trained network. On a high level, this summarizes gradient-decent-based learning.

PGD Attack

Congratulations, if you understand gradient descent you already understand the PGD attack. For the PGD attack, we take the network and define a new loss function. For our german shepherd image, we want it to be classified as a pelican (or whatever you fancy). Now, similar to gradient-descent-based training we compute a gradient.

However, we don’t compute the gradient with respect to the network’s weights (those are fixed) but with respect to our image’s pixels. Intuitively this gradient tells how to change the pixels of our image such that the network thinks it’s a pelican as quickly as possible.

Illustration of PGD attack (without final projection). (Image by author)

And that’s almost it. We apply the changes and continue until the neural net actually classifies our image as a pelican.

You may be asking where the P, the projection, of PGD comes into play. The PGD attack allows us to define a change limit for every pixel. All color channels of every pixel take values between 0 and 255. For the PGD attack, we can specify that no pixel should be changed more than 10 points. Whenever a pixel would be perturbed by more than 10 points, the pixel is projected to its allowed set of values.

Python Package

Fortunately, we don’t need to understand all the details and reimplement the PGD attack from scratch. There is an amazing python package called Foolbox ( which supports the PGD attack (and many more) and is compatible with TensorFlow and PyTorch. With Foolbox, generating our pelican adversarial image from the german shepherd picture is short and simple:

criterion = TargetedMisclassification(pelican_label)
attack = PGD()
adv_input =, german_shepherd, criterion, epsilon=10)

If you want to generate the adversarial image yourself and run and tweak the code, I highly recommend checking out the Jupyter notebook:


From a picture that our classifier correctly labeled as a german shepherd, we created an adversarial image — an optical illusion for neural nets — that showed the same dog but was misclassified by our network as a pelican. Creating these optical illusions can be a fun exercise. However, in real life, adversarial images can have serious consequences. Just think of a self-driving car mislabeling a stop sign as a speed limit. There are techniques to defend against adversarial images but that’s for another story.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: