Original Source Here
My strategy in this article is to equip you with the basic steps that go into building a neural network in order to solve a task we are interested in. Rest assured, there will be no mathematics involved!!!
It all starts with data and lots and lots of it…..
Deep Learning technique requires us to train a Neural Network which is an algorithm that learns from data (usually more the data, better the learning). The principle is to provide lots of labelled data. Hey! wait a minute, i will explain what labelled data means, but let me first familiarise with the analogy that we are going to use in order to debunk the working of Neural Networks.
Here, we will consider the scenario of a math class in school. I know math is not very exciting for most of the people, that’s why i selected the math class to make math more interesting.
- The student here is the Neural Network which is trying to learn, just like a student in school.
- The teacher is the Machine Learning Engineer who is trying his/her best so that the student can achieve good grades.
- Think of good grades as accuracy, which generally every Neural Network tries to maximise.
- The only difference here from real school, is the teacher will not teach any lessons to the pupil but only provide the questions with answers and try to occasionally improve the student’s grades.
Training the Neural Network :
As i mentioned earlier, Neural Network learns from data. In our math class analogy data simply means the Math problems. This process is known as training the NN (for convenience i will use NN for Neural Network). In the training phase it will learn the patterns from the labelled data (ie. the math problem with correct solution). Consider, the teacher wants the student to learn multiplication, the student will be provided something like:
1 x 1 =1
2 x 2= 4
3 x 3= 9
4 x 4= 16
Above is an example of labelled data, if we had not provided the answers like 1,4,9,16 it would not be considered as labelled data (also known as training set). Now, the student (NN) will try to learn on it’s own, as how 2 x 2 = 4 or how 4 x 4 = 16. Later on, it will be tested on problem which it hasn’t learnt from before like 5 x 5 = ?. If the student (NN) gives correct answers he will get good grades (good accuracy), meaning the student has learned to solve the problems. Yayy!!!!
Time to perform predictions (examination time) and the grading criteria (underfit, overfit or the ideal fit) :
After the student has learned the patterns from labelled data. Now, we take the test by providing the student some math problem (features or input) and tell the student to come up with correct solution (label or output).
There may be three different grades based on the performance of the student :
i) Underfit Grade :
Suppose, the student is not able to learn the patterns from the data and tries to come up with a strategy in the examination. The student thinks that if he will answer the questions with one same answer he might get some questions correct, he applies this strategy and answers 7 for every problem:
1 x 1=7
6 x 6= 7
5 x 5= 7
4 x 4= 7
Hmmm, the teacher identifies that the student has failed to grasp the knowledge of multiplication and performing very poor. This is the case of “underfit” as student is clearly not able to answer correctly the problems that it learnt from during the training.
Hence, the student (NN) receives “underfit” grade.
ii) Overfit Grade :
Suppose, the student finds it easy to memorise the solutions to the problems that it encountered while training. The student thinks that now he has memorised the answer’s there is no need to worry and he will get top grades. Well, let’s test the student.
1 x 1= 1
6 x 6= 66
9 x 9= 99
4 x 4= 16
8 x 8= 33
That’s interesting, here the student answered correctly the problems that it learned during the training, but it clearly failed in finding solutions to the problems it didn’t encountered before. This is the case of “overfit” as the student learned that patterns that completely represent the problem in the training examples but fail to generalise in achieving a universal solution for all problems.
Hence, the student receives the “overfit” grade.
iii) Ideal fit Grade :
Now, the student decides to learn the patterns in a way that captures the essence of the problems he encountered during training as well as such patterns that can generalise to unseen problems. For example, the student learns that if we add 2 twice (2 + 2) we get 4, similarly 3 x 3 = 9 is a result of 3 being added thrice, and so on. Let’s test the student (NN) now,
3 x 3= 9
11 x 11= 121
5 x 5= 27
1 x 1 = 2
10 x 10= 100
7 x 7= 49
9 x 9 = 81
Although, there are some examples where the NN predicted incorrectly, but we can say that it did a pretty great task. Beautiful!! that’s something called as an “idealfit” as the generalise algorithm is learned by the student for multiplication operations.
Hence, the student receives the perfect “ideal” grade.
Let’s look at these visually. The blue point in the graphs represent the problems that were provided to student in the training set to learn patterns. The red line represents the patterns that student learnt from those points.
2. Ideal Fit
Underfit or Overfit ?? Teacher to the rescue!!!!
For underfit, the teacher tries to provide some more examples so that the student can learn mapping and come up with some function that solves the problem by providing a correct output. As a result, the student will try to learn more patterns that it failed to learn due to limited data.
Similarly, there are several other ways as well in case of an actual Neural Network to reduce “underfitting” :
1. Increase model complexity
2. Increase number of features, performing feature engineering
3. Remove noise from the data.
4. Increase the number of epochs or increase the duration of training to get better results.
(Source for above mentioned steps to reduce underfitting : https://www.geeksforgeeks.org/underfitting-and-overfitting-in-machine-learning/ )
Regularization : For reducing the overfitting, the teacher might punish the student for every problem solution memorised instead of learning the underlying function (mathematical steps in our multiplication example) that gives the solution of a problem. Greater the regularization, harder the punishment level. Eventually, the student will capture the general patterns rather than memorising training set solutions.
There are other ways to handle “overfitting” as well in Neural Network.
1. Increase training data.
2. Reduce model complexity.
3. Early stopping during the training phase (have an eye over the loss over the training period as soon as loss begins to increase stop training).
4. Ridge Regularization and Lasso Regularization.
5. Use dropout for neural networks to tackle overfitting.
(Source for above mentioned steps to reduce overfitting : https://www.geeksforgeeks.org/underfitting-and-overfitting-in-machine-learning/)
Hurray!!! we just learn’t how the training of a Neural Network happens, what problems can occur (underfit or overfit) and how to solve them.
Self-driving car high level overview :
The similar process occur in building a self driving car with the help of Neural Network as well. First, the network is trained with the help of a human. The human drives the car and the network learns the patterns that it identifies observing the human driving. It learns that the car must stop at red light or at zebra crossing, do not go beyond speed limit, drive in the correct lane, et cetera. Eventually, after lots and lots of training it can learn to drive by learning the patterns of driving a vehicle.
Similarly, the Neural Network can learn how to play music if it is trained on lots of musical compositions, how to converse when trained on lots of conversations, and many such tasks.
Note that, building self driving car is not so easy there is a lot more that goes in building a Neural Network that are very complex that one needs to take care of like hyperparameter tuning, setting the learning rate, and a lot more……
Thank you!!! for considering this article worth a read. If you liked it please give it a clap so i can come up with some more articles that are simple to understand.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot