# Automatic Differentiation in Machine Learning

https://miro.medium.com/max/1200/0*1Rnl20PdgVFkw0WN

Original Source Here

# Manual Differentiation

If you are familiar with calculus, you would be able to calculate the partial derivative for `f` with respect to `x1` and `x2` straight away. But for the benefit of those who need a quick refresher, I will discuss some of the more basic calculus formulas or identities.

• Linear function (ie. a ∈R):
• Polynomial function (ie. a, n ∈R):
• Linearity (where `f` and `g` are functions):

So given our function `f` earlier and a point `x1=2`and `x2=3`, its derivative with respect to (w.r.t.) both `x1` and `x2` are as follows:

This method works, albeit with a lot of processes. But as the function f becomes more complex, the processes can become unnecessarily huge and cumbersome. The good news is that it can be automated via symbolic differentiation.

# Symbolic Differentiation

In symbolic differentiation, the mathematical expression is parsed and converted into elementary nodes. These nodes correspond to basic functions where differentiation is supposed to be trivial (constants, polynomials, exponential, logarithm, trigonometric, etc).

The derivatives of these elementary nodes are then assembled using the rules for combined functions (linearity, product, inverse and compound rules), to obtain the final form of f’(x).

We can use SymPy library to perform such a procedure by first defining the `x1` and `x2` symbols.

`import sympyx1 = sympy.symbols('x1')x2 = sympy.symbols('x2')`

and then our function `f`

`def f(x1, x2):    return x1**2 + 2*xprint(f'f(x1, x2) = {func}')>>> f(x1, x2) = x1**2 + 2*x2`

Now, we perform the symbolic differentiation by calling the `diff()`method.

`gradient_func = [func.diff(x1), func.diff(x2)]print(f'gradient function = {gradient_func}')>>> gradient function = [2*x1, 2]`

We got the same result as the one when we are using the manual differentiation method.

The final step would be to insert the values of `x1` and `x2` respectively.

`gradient_val = [g.subs({x1: 2, x2: 3}) for g in gradient_func]print(f'gradient_values = {gradient_val}')>>> gradient_values = [4, 2]`

Nicely done!

However, for large functions, the graphs can become extremely big. This could result in a slow performance despite our best efforts of pruning, etc. This is where automatic differentiation comes into the picture!

# Automatic Differentiation

Autodiff is typically used by some of the more popular deep learning frameworks such as Tensorflow and Pytorch because of its simplicity and highly efficient way of computing derivatives.

There are two modes in Automatic Differentiation, forward mode, and reverse mode.

• Forward Mode: The goal of this mode is to create a computation graph, similar to our symbolic differentiation method above. we split the problem into its elementary nodes consisting of arithmetic operations, unary operations and geometric functions. The computation graph of our function is illustrated below:

In this forward pass, we insert the values of `x1` and `x2` down the steps, as follows. Notice also that we define `x3` and `x4` as our intermediate steps.

As expected, our f(2,3) = 10. Now let’s calculate the gradient.

• Reverse Mode: This is where we calculate the derivative of the function for each of the steps. We use the chain rule to compute derivatives of.

Let’s calculate the partial derivatives of each step with respect to its immediate input:

Now for the hard part: calculate the partial of `f` w.r.t. to `x1` and `x2` . As hinted earlier, we are going to use chain rule by traversing the line from the terminal node to the `x1`node: red line (also to `x2` : blue line) as follows:

We got the same results as those in manual and symbolic differentiation!

The chain rule has an intuitive effect: the sensitivity of f with respect to an input x1, is the product of sensitivities of each intermediary step between x1 and x4: sensitivities “propagate” along with the computational graph.

# Parting Thoughts

This post introduced you to the basic concept of autodiff with a simple running example. I hope you enjoy learning this concept as much as I do!

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot