Original Source Here
Let’s build your 1st Artificial Neural Network Model using Python (Part 2)
How does Artificial Neural Network Work?
If you have read my last post — Let’s build your 1st Artificial Neural Network Model using Python (Part 1), you shall be familiar with the structure of a artificial neural network. In this article, we will dive deeper to understand the mathematics behind the neural network.
Afraid of math? Don’y worry! I’ll use a very simple example to walk you through the whole calculation.
Typical Neural Network Workflow
In last post, we knew that by modifying the weights, we can gradually tune the network to behave more in the manner we want. That’s exactly what Neural Network gonna do.
In order to understand the workflow, I actually found this online article written by Matt Mazur is very helpful. And I will try to add some of my personal understanding/thoughts on top of Matt’s perfect work below to make up a concrete example to explain what each step does.
Here we go, basically, as below chart shows, each time we feed in an input, there are in total 7 steps that a neural network would take to generate the output and update(modify) its weights.
Step 1–2 Random Initialization and Forward Propagation
The first 2 Steps are Random Initialization and Forward Propagation, where we randomly assign weights and bias(w1, w2 … w8, b1 and b2 below) and do the whole calculation chain by feeding in 1 record in our training data(i1 = 0.05, i2 = 0.10).
Finally we got the predicted result(o1 = 0.75136507, o2 = 0.772928465), which is different from the expected result(o1 =0.01, o2 = 0.99).
Does that mean the result is bad? Yes of course!
But as the failure teaches success, let’s see how the neural network would correct itself through next few steps.
Step 3 — Calculate the Loss
We notice the actual out put(predicted result) is far deviated from the expected result, so in step 3, neural network will calculate the loss, or the difference between the predicted value and the actual value. In this case, the total error or difference is 0.298371109.
The loss function or error function used to calculate the total error is similar to the mean square error used in statistics. The 1 ove 2 used here just to offset the square sign when calculating the derivative.
The neural network then will update the weights and bias to minimum the error function.
The calculus tells us, if we calculate the slope of the error function, namely, the derivative of error function with respect to the weight, we’ll get the direction we need to move towards, in order to reach the local minima.
Now that we have found the direction we need to nudge the weight, we need to find how much to nudge the weight. This Learning Rate(yita) can be thought of as a, “step in the right direction,” where the direction comes from the slope, which is the derivative of error function with respect to weight.
Usually, we set the learning rate as a small number like 0.01–0.5.
We call this calculation method the gradient descent.
Step 4–6 Backpropagation Steps
Taking W5 as the example, we want to know how to update W5 to minimize the total error. As the learning rate is predefined, we only need to know the partial derivative of total cost with respect to W5(slope).
The Pic 5 below shows the calculation details.
Besides W5, the neural network will do the same for all the other weight variables.
Actually，the neural network does the same updates for bias, more details you may refer to this stackoverflow post.
Step 7 — update the weights and iterate through Backpropagation until convergence
Steps 1–6 is only 1 iteration based on 1 data sample, we need to feed in additional data samples to iterate the update till each weight gets convergence.
In this article, we used one example to go through the 7 steps that a neural network would take to generate the output.
In the next chapter, we will use python to build a Neural Network Model.
Please stay tuned!
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot