Computer Vision A-Z Briefly Explained Part 1

Original Source Here

Computer Vision A-Z Briefly Explained Part 1

Image by Author


Computer Vision is an area of Artificial Intelligence in which computers try to gather information from Pictures & Videos, simply visual samples. Of course, Programming these notions are important. However, before Programming, it is essential to cover the logic behind the definition deeply.

Whether you are at the beginning of your career or you might be a Senior in your field, reading these terms will serve you.

In just 7 minutes, you will deeply cover whole Computer Vision terms, sometimes with mathematical functions, graphs, and real-life examples.

Now, let’s begin our journey to Computer Vision.


Adding layer/layers to an image to prevent shrinking its size.

Convolution Operation

Multiplying a picture’s number transformed form with the predefined array in a special way.

Valid Convolution

After convolution operation, the size of the input will be changed according to filter size.

Same Convolution

After the padding operation, the output size will be the same as the input size.

Strided Convolution

The step size of the moving box when we use it in multiplying operations will be two instead of one.


The aim is to reduce the input image.

Average Pooling

Making pooling operation by averaging.

Max Pooling

Making pooling operation bay calculating max of the related numbers.

Famous Networks


LeNet is a convolutional neural network structure proposed by LeCun et al. in 1998,[1] . In general, LeNet refers to LeNet-5 and is a simple convolutional neural network.

Layer Structure

  • First layer- Convolution layer
  • Second layer- Average Pooling layer
  • Third layer- Second Convolution layer
  • Fourth layer- Average Pooling layer
  • Fifth layer- Fully Connected Convolution layer
  • Sixth layer- Fully Connected Convolution layer
  • Output layer- Fully Connected Softmax Output layer


AlexNet is the name of a convolutional neural network (CNN) architecture, designed by Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton, who was Krizhevsky’s Ph.D. advisor.


  • First layer- Convolution layer
  • Second layer- Second Convolution layer
  • Third layer- Third Convolution layer
  • Fourth layer- Fourth Convolution layer
  • Fifth layer- Fifth Convolution layer
  • Sixth layer- Fully Connected Convolution layer
  • Seventh layer- Second Fully Connected Convolution layer
  • Eight layer- Third Fully Connected Convolution layer
  • Output layer- Fully Connected Softmax Output layer


VGG Net is the name of a pre-trained convolutional neural network (CNN) invented by Simonyan and Zisserman from Visual Geometry Group (VGG) at the University of Oxford in 2014.

Image by Author – Vgg-16 Structure

Inception Network

Photo by Christophe Hautier on Unsplash

If you can not decide which convolution operation or pooling layer you want to do, instead of doing one of these, do all of them, and stack them together.

If your problem is computational, then use the bottleneck layer.

Transfer Learning

Photo by Toa Heftiba on Unsplash

Find another Convolutional Deep Learning network, which uses to define another problem, and change the last layer with a predefined softmax algorithm, which is specifically defined for your example. Let’s say you have 4 or 5 different things to classify at the end of your algorithm, then finish up your code. That will give you a great result.

Data Augmentation

Photo by Tommy Bond on Unsplash

This technique is used to improve your Computer Vision System’s performance.

This technique simply copies your data in different ways(mirroring, cropping, rotating, color shifting) and add that data to your data set, and improves your Algorithm performance.


Thank you for reading my article so far.

Even if there are many other terms that exist in Computer Vision, as I always do, I will leave others to the second part of that article.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: