Landmark Recognition — Final Project at Ironhack Amsterdam

Original Source Here

Landmark Recognition — Final Project at Ironhack Amsterdam

In October 2020, I did an intensive Bootcamp in Data Analytics from Ironhack Amsterdam. It is a coding boot camp that will turn you into a real analyst in just 9 weeks. It’s a very intense program where you can learn data analytics from scratch and be industry-ready.

My final project was Landmark Recognition where I used machine learning techniques to detect a landmark from a given image.


Have you ever tried Google Lens to identify landmarks? You point the camera to a building and it tells you the landmark. I first saw this feature two years ago, I was amazed by it and thought it was magical but couldn’t understand how it works. After going through this course I thought I can replicate it. I took some old pictures of my husband when he visited Rome and used the model that I trained to find the landmarks from them. My model was able to predict the right landmark from 7 out of 8 images.


  • Examine and understand the data.
  • Prepare the data.
  • Build the model.
  • Train the model.
  • Test the model.
  • Improve the model.

Exploring the model

For the project, I downloaded the image dataset from Kaggle – Google Landmarks Dataset. The dataset has images divided into training data and testing data. Once I got the dataset, I started the exploratory analysis, I loaded the CSV files to a data frame and found out the datatypes of the columns, and removed invalid links and null data. I found more than 4 million images tagged with 14,946 unique landmarks.

Preparing the data

Each landmark had more than 10 thousand images. To limit the scope and run the training on my laptop with limited processing power, I used only the top 13 most frequent landmarks from the dataset. and only 2000 images for each each landmark. Out of them , I picked 1600 images for training and 400 for testing.

Here are some of the images from the dataset tagged with the landmark labels

Landmark Images with Labels

Creating the Model

I used Convolutional Neural Networks (CNN) to build and train the models from the image dataset. I used Tensorflow and Keras libraries to achieve this.

For building the model I used two approaches:

  • CNN without Transfer Learning.
  • CNN with Transfer Learning.

What is Transfer Learning?

Transfer learning is a machine learning technique where a model trained on one task is re-purposed on a second related task. Transfer learning is an optimization, a shortcut to saving time, or getting better performance.

CNN without Transfer Learning

Firstly I rescaled the image size as the convolutional neural network accepts values between 0 to 1 as the images are color images they have three channels,i.e. red, green, and blue popularly knows as “RGB” values and they lie between 0 to 255.

The model consists of the following:

  • The input layer is an ‘RGB’ image.
  • The output layer is a multi-class label.
  • Hidden layers consisting of convolution layers, ReLU (rectified linear unit) layers, the pooling layers, and a fully connected Neural Network.

Convolution layers are responsible for capturing the low-level features such as edge, color, shapes etc. Pooling layers are responsible for reducing the size of the convolved features.

Once the pooling is done the output needs to be converted to a tabular structure called Image Flattening, which can be used by an artificial neural network to perform the classification.

ReLU or rectified linear unit is a process of applying an activation function to increase the non-linearity of the network without affecting the receptive fields of convolution layers. ReLU allows faster training of the data, whereas Leaky ReLU can be used to handle the problem of vanishing gradient.

The final output is calculated using ‘softmax’ which gives the probability of each class for the given features.

After the model is built, it is compiled and fit. This model achieved 61.29% accuracy for the training data with 10 epochs.

CNN with Transfer Learning

There are top-performing models for image recognition that can be downloaded and used as the basis for image recognition. For this project, the pre-trained model I used is Residual Network – ResNet50 which is used as a base layer i.e; the input layer.

What is ResNet50?

ResNet-50 is a convolutional neural network that is 50 layers deep. You can load a pre-trained version of the network trained on more than a million images from the ImageNet database. The pre-trained network can classify images into 1000 object categories, such as a keyboard, mouse, pencil, and many animals. As a result, the network has learned rich feature representations for a wide range of images. The network has an image input size of 224-by-224.

The first time a pre-trained model is loaded, Keras will download the required model weights. Weights are stored in the .keras/models/ directory under the home directory and will be loaded from this location the next time that they are used.

When loading a model, the “include_top” argument is set to False, in which case the fully-connected output layers of the model used to make predictions is not loaded, allowing a new output layer to be added and trained.

The “weights” argument is set to ImageNet as in this case using the weights of the pre-trained model.

Added two hidden layers with an activation function as ReLU (rectified linear unit) and a fully connected Neural Network.

The final output is calculated using ‘softmax’ which gives the probability of each class for the given features.

After the model is built, it is compiled and fit. This model achieved 97.50% accuracy for the training data with 10 epochs.

Testing the model

For testing the model, I used the transfer learning model as it gave me 97.50% accuracy.

Testing data

I used the model on 9 pictures and it correctly identified the landmarks in 7 of them with a confidence of 18.5%. The images of a cat and a dungeon returned a prediction with much less confidence of 11.1%. Looks like my model worked well for the pictures.

Improving the model

To improve upon the work done, here are some next steps.

  • The confidence levels for predictions are low. This needs to be investigated and fixed.
  • For testing the model, I used 20% validation data from the training set. In the future, I want to test the accuracy of the model using entire test dataset.
  • For training the model, I only picked 13 landmarks due to limitations of processing power, but want to expand the list.

The code is available on github – Landmark Recognition

Thank you for reading the entire story. This is my first article on medium. Please let me know what you think in the comments below.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: