Image Recognition Algorithm using Transfer Learning

Original Source Here

Image Recognition Algorithm Using Transfer Learning

Training a neural network from scratch takes a great extent of time and huge computation power. One way to overcome both these obstacles is to implement transfer learning.


Not having sufficient data, time or resources represents a critical complication in building an efficient image classification network. In this article, I present a straightforward implementation where I get around all these lack-of-resource constraints. We will see what transfer learning is, why it is so effective, and finally, I will go step-by-step in building an image classification learning model.

The model I will develop is an alpaca vs. not alpaca classifier, i.e. a neural network capable of recognizing whether or not the input image contains an alpaca. I chose this circumscribed task for several reasons:

  • The original pre-trained model doesn’t know an “alpaca” class. I want to explore the potential of transfer learning with unseen classes.
  • I don’t possess many data examples of alpaca vs. not-alpaca instances. I want to assess what transfer learning can do with just a few data points.

Finally, I will test the algorithm with some alpaca pictures I personally made during one of my recent hikes. These pictures are under different light conditions and the alpaca isn’t always close-up.

The code and the notebooks are available in this GitHub repository:

The dataset used for the second step of training will be fetched from the Google Open Images V6 dataset.

Transfer Learning

Suppose you want to build an image classifier, but you can’t afford to train your learning model for weeks, nor do you have top-of-the-notch GPUs available for this task. Instead of developing a neural network from scratch, you can download an existing model and train it even more. This technique is called transfer learning: it consists of using a well-established neural network (or just part of it) and making it suited for your specific computer vision project.

Over the years, academic researchers and companies developed very deep convolutional neural networks, achieving state-of-the-art levels of accuracy in image recognition tasks. These networks are tens or hundreds of layers deep and were trained on millions of images (typically on the ImageNet database) for extended periods of time. Examples of open-source pre-trained networks are ResNet-50, Inceptionv3, MobileNetV2, and many more.

Schematic architecture of Inception v3 network. Source:

In convolution neural networks (CNNs) the first convolutional layers detect simple features such as edges or shapes, the middle layers recognize parts of objects (e.g. eyes or mouth in face recognition), and lastly, the final convolutional layers can identify more complex features like faces. For this reason, the initial layers on a CNN fulfill more general tasks. On the contrary, the final layers are more specialized. This peculiar feature of convolutional neural networks allows taking existing pre-trained networks, freezing the parameters (weights and biases) of all the layers except the last few ones, and training the network for a few additional epochs. As a result, we can take advantage of a deep network trained on enormous datasets, but, at the same time, make it specialized for a more specific image recognition project. Depending on how much the convolutional layers are specialized for their original tasks, we can choose to freeze a bigger or a smaller portion of the network.

Transfer learning plays an important role in developing computer vision algorithms under different data availability conditions. If I would have just a few data to train the network with, I would freeze all the pre-trained network’s weight except the output layer: only the softmax layer will be retrained with new instances. Another scenario is if I have a larger training set available. In this case, I would freeze fewer layers and retrain more of them. Finally, if I can feed the network with a huge training set, I would use the pre-trained weights as an initialization point for my network. In that manner, I would speed up convergence.

Create the TensorFlow dataset

After importing the required libraries, the next step is to generate two TensorFlow Datasets, one for training and one for validation. 20% of the images are used for validation. Both the test and validation sets are combined into batches of size 32.

Please, check the Jupyter notebook on how to download the images from Google Open Images.

To generate the dataset I use theimage_dataset_from_directoryfunction. I provided the path to the directory that contains the sub-directories for each class. In my case, “alpaca” and “not_alpaca”. Having set the validation_splitparameter, I have to specify which set is for training and which is for validation. Finally, I set a seed to avoid overlapping between the two datasets.

One of the great advantages of the TensorFlow API is that it automatically reads the class labels from the sub-folders names. We can see that by applying the claa_names attribute to the dataset object:

We can see what the training images look like by printing some of them. Alpaca instances are of different poses and sizes, non-alpaca instances are mainly animals on animal-shaped toys.

Preview of some training examples. Source: author.

Import the pre-trained model

The TensorFlow API allows to easily import pre-trained models. For this application, I will use the MobileNetV2 network because its architecture is based on residual connections, resulting in a fast network that can be used also on low computation devices like smartphones.

The first thing to do is to import the network by calling the tf.keras.applications.MobileNetV2function:

It’s required to provide:

  • the input shape, which it’s obtained by adding the color’s dimension to the image shape
  • whether or not to include the final layers. In this case, I won’t import the final layers because I want to train a brand new output layer for my specific task
  • whether or not to import the pre-trained weights. In this case, I import the weights resulting from the training on the Imagenet dataset

By printing the network’s summary we can see what it looks like:

The image above describes only the first 4 layers since the whole network (156 layers, excluding the final ones) wouldn’t fit on an image. You can see the description of all the layers on the Jupyter Notebook that I uploaded in my GitHub repository.

Modify the network and train the model

As anticipated above, I will add to the network a specific output layer that will be trained from scratch.

I will now explain each line of the code.

  1. The MobileNetV2 comes pre-trained on the normalization range [-1, 1]. For this reason, I replicate the same input normalization layer
  2. The weights of the pre-trained model are set to non-trainable
  3. Define the input layer of shape (160,160,3)
  4. Apply the input normalization step
  5. Add the pre-trained model
  6. Apply an average pooling layer to reduce the dimensions of the convoluted images
  7. Add a dropout layer to apply some regularization (thus reducing overfitting)
  8. Add the output layer, which consists of a single unit with a Sigmoid activation function. A single unit is sufficient for a binary classification problem
  9. Finally, combine the model by specifying the inputs and outputs

Once the model is defined, it’s time to compile and train it. I’m using the Adam optimizer and the binary crossentropy as the loss function. As the evaluation metrics, I use accuracy.

Learning curves without data augmentation. Source: author.

The accuracy score rises organically up to a plateau at about 95%. Training and validation accuracy are paired, implying that the algorithm is not overfitting to the training data.

I want to test the algorithm on a batch of images of alpacas that I took during a hike. I added some random non-alpaca images to the test set (like a goldfish or a chocolate cake), just to address eventual false positive errors. Given the limited number of test images, I use this simple snippet for testing:

No false positives are reported, however, some of my alpaca pictures have been mislabeled:

Misclassified images. Source: author.

The picture on the left is actually very dissimilar to the training examples: the animal is in the background and partially covered by the fence. The image on the right, however, was mislabeled even if the animal is clearly visible and in focus.

I will try to make the neural network more robust by adding some data augmentation layers.

Data augmentation

I will skip the explanation about what data augmentation is and what its advantages are. For all the details and for a practical application I suggest reading this article about data augmentation.

To implement data augmentation I’m adding a sequential portion of the network, which is composed of 2 layers: one randomly flipping the images horizontally, and one performing a random rotation of them.

After training the augmented model for 20 epochs, and reaching 97% accuracy on the validation set, both the above pictures were correctly labeled as alpacas.


The possibilities of transfer learning are countless. In this article, I presented how to take advantage of open-source pre-trained networks to easily build an image classification CNN. I reached a satisfactory level of accuracy on the validation set, but with some arrangements, it can be improved even more. Some of the improvements could be adding more dense layers (with ReLu activation function), performing more augmentation steps (shearing, mirroring, zooming), and retraining more final layers of the original MobileNetV2 network.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: