Understanding Convolutional Neural Network (CNN).

Original Source Here

Drawbacks in using ANN for images:

ANN leads to parameter explosion which has large parameter to train neural network. ANN will lose the information while flattening out the image. ANN works better for similar kind of image. ANN captures only the centre of the image, whereas CNN captures regardless of any position.

A CNN uses convolutional layers to help alleviate these issues. A convolutional layer is created when we apply multiple image filters to the input images. The layer will then be trained to figure out the best filter weight values. A CNN also helps reduce parameters by focusing on local connectivity. In convolutional layers not all neurons will be fully connected. Instead, neurons are only connected to a subset of local neurons in the next layer which end-up being the filters.

Convolution focuses on local filter, where different filter start began to identify different parts of the image. Stacking filters together will result in convolutional layers. For colour images, we have intensity values of RGB, it is represented as (1280, 720, 3) (Height, Width, Colour). In colour images we end up with 3D filters where the often convolutional layers are fed into another convolutional layers, this allows the networks to discover patterns within patterns usually with more complexity for later convolutional layers.

Pooling Layers:

Why do we need pooling?

Suppose we take images of cheetah which has face in different position of each image. Pooling takes out the important feature from the image which helps to identify images regardless of any position.

Even with the local connectivity, when dealing with colour images and possibly 10s or 100s of filters we will have a large amount of parameters. We can use pooling layers to reduce this. Pooling layers accept convolutional layer as input. Neurons in a pooling layer have no weights or biases. A pooling layer simply applies some aggregation function to all inputs.

CNN Architecture

There are several types of pooling available like Max Pooling, Average Pooling, Sum Pooling.

Max Pooling takes the maximum value of the box which moves all over the matrix with filter size of (2 x 2) and stride length of 2.

Max Pooling.

You can see from the above image after applying pooling layers the important information is still preserved whereas the 16 elements is reduced into 4 elements which helps Neural Network to recognise features independent of location (location invariance). Average pooling is simple taking out the average of the box matrix.

Pooling greatly reduces our number of parameters. This pooling layer will end up removing a lot of information, even a small pooling ‘kernel’ of (2×2) with the stride of 2 will remove 75% of the input data. However the general trends will be true through out the pooling layer by which it creates generalised model mitigate overfitting.


Pooled feature map is flattened to column vector before feeding it to the densely connected Artificial Neural Network.

CNN Architecture

Input image is processed by convolutional layer which comprises of kernels and activation layer to make the image non-linear, then it passes thorough pooling layer to minimise the size of an image before transferring it to the fully connected layer it is flattened.

CNN can all types of architecture. There is no thumb rule to design architecture it’s all based on the error metrics and use cases.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: