# When Getting It Right Gets It Wrong

Original Source Here

# Neural Networks and Gradient Descent

Neural Networks use Gradient Descent for the same reasons Logistic Regression uses it — to find a minimum point of error in the data so that you can use it to make predictions. With linear data, this works really well, and the more variables / features you have, the more precise the line you draw can be. This is great when you have linear data, since your model can assume there is a single low error point it needs to find. However, as we saw in the previous post, Neural Networks are often figuring out functions that have many curves. Take this example:

With Neural Networks, you start at a random point and then use Gradient Descent to find the lowest point (the lowest error) you can. Say for example, you start here:

Gradient Descent would flow down the side of the wall you are on to find the lowest **local** minimum, so you’d end up here:

However, the next time the model is built, you might start here:

…and thus end up here:

The first thing to note is that the local minimum you find is almost completely dependent on your starting point. There are roughly four local minimums in this function:

Leaving things up to random chance and finding a really shallow minimum (like the first one from the left) is definitely a risk. Another risk, however, is finding a really deep minimum (like the second one from the left). Why is this a bad thing? After all, isn’t that technically the point with the lowest error?

This is a challenge because when you run test data through your model, moving just slightly in one direction or another has a big impact on the prediction. Move a small percentage left or right, and your predicted value changes dramatically.

The goal, then, is to find minimums at wide points in the overall function, where small movements don’t result in large predicted changes. The third and fourth minimums from the left would both work nicely for this. But if the minimum you’re going to find are dependent completely on where you happen to randomly start, that’s not going to provide satisfactory results a good percentage of the time.