Keras based Implementation of LeNet-5

Original Source Here

Keras based Implementation of LeNet-5


LeNet-5 is one of the first Convolutional Neural Network models proposed by Yann LeCun et al. in 1989. The architecture was first observed in the paper “Gradient-Based Learning Applied to Document Recognition.” The authors have used this architecture to recognize handwritten digits. We won’t be going into the details of this paper; rather, we will focus more on implementing the architecture in Keras.

The LeNet-5 architecture has described in the paper is shown below

LeNet 5 Architecture

Architecture Overview

The input image that is feed to the network is 32 x 32. This is fed to the first convolutional layer with a filter size of 5 x 5, stride =1, and the image is fed to 6 such filters ( filters are also called Kernels), which generates an output C1. Kernels are nothing but a feature extractor of an image and are also known as “feature maps.” The size of each feature map here is 28 x 28, and 6 such feature maps are generated. Let’s understand how we obtained the dimensions 28 x 28. This is one of the most important things in the understanding of architecture.

Let us assume the size of image be n x n and the kernel size be k x k and stride be s

So the output size of the feature maps is obtained by the formula ( [ (n-k) / s ] + 1 ) x ( [ (n-k) / s ] + 1).

In our case the value of n = 32 and k = 5 so the feature map size (32–5+1) x (32–5+1) = 28 x 28.

Now that we have understood this, the rest of the architecture is based on the above calculation.

The second layer is the pooling layer or subsampling layer. The subsampling layer performs sampling of the feature map to reduce its size. This is done to reduce the overall computation since the neighbouring pixels store more or less similar information. The subsampling layer used is of size 2 x 2 with stride = 2. The output of the subsampling layer is the feature maps S2 with size 14 x 14 (Calculation is the same as the above).

The third layer again is the convolutional layer with filter size 5 x 5 and stride = 1. The only difference here is we would be the number of kernels (16 kernels.) The output C3 would be of the size 10 x 10 with 16 feature maps.

Layer C3 is a convolutional layer with 16 feature maps. Each unit in each feature map is connected to several 5 x 5 neighbourhoods at identical locations in a subset of S2’s feature maps. The first six C3feature maps take inputs from every contiguous subset of three feature maps in S2 The next six take input from every contiguous subset of four The next three take input from some discontinuous subsets of four. Finally, the last one takes input from all S2 feature maps.”

Each column indicates which units combine feature map in S2 in a particular feature map of C3.
Mapping of each S2 on C3

The obtained C3 consists of 16 layers, each of 10 x 10. As earlier, again subsampling layer here is used of size 2 x 2 with stride = 2. The output here is a feature map S4 with 16 layers of size 5 x 5. The feature map S4 is convoluted with 120 kernels each of size 5 x 5 followed by a flattening layer to get the output C5 layer. A fully connected layer F6 of 84 layers is connected after the C5 layers. At last, we have 10 output layers obtained from the F6 layer as the target variable has 10 distinct values. Now let’s look at the implementation of the same architecture in Keras.


Below is the implementation of the entire architecture in Keras using Functional programming.

Model Summary
Variations of train and validation loss on different epochs
Error Plots
Confusion Matrix of all the classes

The entire python code for the same, along with the Jupyter notebook, can be accessed here:



Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: