What can we learn from gradient descent?



Original Source Here

What can we learn from gradient descent?

Imagine you climbed up a hill early in the morning and suddenly the weather has changed! It is no longer sunny and clear anymore, but instead a heavy fog has taken over. You are scared, running out of water and have to figure out a way to go back down. You have absolutely no idea where you are and can only look around your immediate vicinity, maybe 3–4 feet around yourself. You have also lost your phone signal, which would have been useless anyway since you wouldn’t be able to tell anyone where you are. The one thing you know for sure is that the hill isn’t steep at any point, and you know that the fog would probably subside and your phone signal would be back when you reach somewhere close to the ground level. What would you do? How would you climb down and reach home safely?

A man trying to walk down a foggy hill? (he looks confused, and it ain’t that foggy!)

One obvious way to deal with this problem is to look around your immediate vicinity and try to find the direction that takes you downhill fastest. You won’t fall off (the hill isn’t steep!) and even though there might be a few wrong turns and a few backtracks you will eventually reach somewhere on the ground level. Once on the ground you can then probably get your GPS or phone working and call someone for help.

The method discussed above is basically the essence of gradient descent. Gradient descent is basically an optimization algorithm that is used extensively in machine learning and deep learning. Technically speaking, any multivariate function F that is defined and differentiable at a point p, then the function decreases the fastest if one goes from p in the direction of negative gradient of F at point p. This is analogous to you trying to go in the steepest direction downhill in the example above. There is also an added parameter that determines how large a step the algorithm takes in each round, and the conventional wisdom is to not set this parameter too high or too low.

An example illustrating how a local minima is reached when we apply gradient descent from a point in a 3d function.

While gradient descent in itself is an amazing optimization algorithm, this article is more of a philosophical musing on what lessons we can learn from gradient descent about taking decisions under uncertainty in general. One really interesting observation is that one can easily model most situations in life as trying to find a local optima in a multidimensional landscape.

Essentially think of any point of time in your life. You are at a state X and you probably want to improve that state. Maybe you want to improve your position financially, or maybe you would like to learn a new skill, or maybe you want to get rid of your addiction to ice cream (or whatever). Essentially you can think of life as a scenario where you sort of want to continuously improve your state, based on your personal definition of “improvement”.

Basically you can define your current state X based on n different parameters (like the ones mentioned above, represented as a n-dimensional vector) and the next state Y as a similar n-dimensional vector. Finally you would have some sort of “score” of how well you are doing based on these parameters (say S(X) and S(Y)), and typically you would want S(Y) to be strictly greater than S(X).

Now I know some of the readers might cringe at the thought of trying to figure out a score function for a really abstract thing like “personal life situation”. But the point is that one does not need to be too specific about the score as an actual number, but rather one can map it to the intuitive understanding of “where do I stand” that most self aware individuals tend to have. And there need not be an actual numerical score as long as there is an intuitive sense of whether one is doing better or worse than their previous condition.

And with this perspective we can approach uncertain situations in life in very similar ways, especially in scenarios where one might not have a good understanding of the surrounding landscape (one difference being it’s more of an ascent rather than a descent, but thats just a technicality). Even though this might seem quite counter-intuitive to people who have a hard time taking decisions with incomplete information, often just taking any step that locally improves your current state is in itself a very good way to proceed against uncertainty. Given an incomplete understanding of the landscape you are in, this is often the most optimal way to make constructive progress without making major mistakes or taking large and unnecessary risks.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: