Gentle Introduction to TensorFlow Probability — Trainable Parameters

https://miro.medium.com/max/1000/0*g5UiT3VjHCnQA91F

Original Source Here

Gentle Introduction to TensorFlow Probability — Trainable Parameters

Introduction

This article belongs to the series “Probabilistic Deep Learning”. This weekly series covers probabilistic approaches to deep learning. The main goal is to extend deep learning models to quantify uncertainty, i.e. know what they do not know.

We develop our models using TensorFlow and TensorFlow Probability (TFP). TFP is a Python library built on top of TensorFlow. We are going to start with the basic objects that we can find in TensorFlow Probability (TFP) and understand how can we manipulate them. We will increase complexity incrementally over the following weeks and combine our probabilistic models with deep learning on modern hardware (e.g. GPU).

Articles published so far:

Figure 1: We are combining two worlds in this series: Probabilistic Models and Deep Learning (source)

As usual, the code is available on my GitHub.

Distribution Objects

In the last article, we saw how to manipulate TFP distribution objects. Remember that distribution objects capture the essential operations on probability distributions. We started with univariate distributions, i.e. distributions with only one random variable. Then, we extended our understanding to how to represent multivariate distributions with the distribution objects properties. We kept it simple as we defined a 2-dimensional Gaussian distribution and did not include any correlation between the two dimensions. The most important properties to recall are the batch_shape and the event_shape. If you are not comfortable with them yet, please check my previous article. We will make use of them extensively during this series.

We are going to go through one more concept regarding distribution objects, before moving forward to introducing trainable distribution parameters.

Independent Distribution

There are cases where we want to interpret a batch of independent distributions over an event space as a single joint distribution over a product of event spaces. This impacts the way we handle the batch_shape and event_shape properties. The independent distribution will be very useful when we start building some well-known algorithms such as the Naive Bayes classifier. The reason is that in the case of Naive Bayes, the features are independent given a class label.

To illustrate, let’s define two normal distributions.

The first is a multivariate normal of the form:

To define the first one, we are going to use the MultivariateNormalDiag as before, since, once again, the dimensions are not correlated between them.

mv_normal = tfd.MultivariateNormalDiag(loc=[0, 1], scale_diag=[1,2])

<tfp.distributions.MultivariateNormalDiag 'MultivariateNormalDiag' batch_shape=[] event_shape=[2] dtype=float32>

We are getting comfortable with the shape properties, hence it is no surprise that we have an event_shape of 2.

As usual, we can compute the log probability:

mv_normal.log_prob([0.2, 1.5])

<tf.Tensor: shape=(), dtype=float32, numpy=-2.5822742>

We get a single value since we have a single distribution even though it is multi-dimensional.

Let’s sample from our independent multivariate Gaussian distribution and plot the joint distribution. We did something similar before.

samples = mv_normal.sample(10000).numpy()
x1 = samples[:,0]
x2 = samples[:,1]
sns.jointplot(x = x1, y = x2, kind='kde', xlim=[-6, 7], ylim=[-6, 7]);
Figure 2: Plot of the approximate joint distribution of the multivariate Gaussian distribution defined above. Univariate plots are also shown for the individual dimensions.

As expected, there are no correlations between the dimensions of our multivariate Gaussian distribution.

Time to represent the batched Gaussian distribution object.

locs = [0,1]
scales = [1, 2]

batched_normal = tfd.Normal(loc=locs, scale=scales)
batched_normal

<tfp.distributions.Normal 'Normal' batch_shape=[2] event_shape=[] dtype=float32>

Notice the batch_shape equal to 2.

batched_normal.log_prob([0.2, 1.5])

<tf.Tensor: shape=(2,), dtype=float32, numpy=array([-0.9389385, -1.6433357], dtype=float32)>

As we have two separate distributions stored in the same object, computing the log probability yields two values.

We can plot the Probability Density Function (PDF) of both univariate distributions.

x = np.linspace(-4, 4, 10000)
densities = batched_normal.prob(np.repeat(x[:, np.newaxis], 2, axis=1))

sns.lineplot(x=x, y=densities[:, 0], label=f'loc={locs[0]}, scale={scales[0]}')
sns.lineplot(x=x, y=densities[:, 1], label=f'loc={locs[1]}, scale={scales[1]}')
plt.ylabel('Probability Density')
plt.xlabel('Value')
plt.legend()
plt.show()
Figure 3: PDF of the two univariate Gaussian distributions batched as a single distribution object.

Let’s wrap up the above so that we can introduce the independent distribution object. It was clear that while the first distribution object returned a single log probability the second returned 2. The difference is that the array that we pass to the first is interpreted as a single realization of a 2-dimensional random variable. In the second case, the array is interpreted as different inputs for each of the random variables — the batches.

To help us grasp what is the independent distribution and how it is helpful, let’s play with some probabilistic jargon:

  • The independent distribution is a simplified way for us to move from univariate distributions to a single multivariate distribution;
  • The independent distribution allows us to move from several distributions of a single random variable to a joint distribution of a set of random variables;
  • The independent distribution gives the capacity to move from several batched distributions to a single multidimensional distribution;
  • The independent distribution is an interface to absorb whatever dimensions we want to absorb to the event dimension;
  • Finally, the more pragmatic and TFP way of describing it — the independent distribution is a way to move batch_shape dimensions of a distribution to the event_shape of a new distribution object.

Hopefully, describing it in so many different ways made all these probabilistic concepts and the way they were translated to TFP abstractions more clear.

Time to apply the theoretical concepts and see the practical implementation.

independent_normal = tfd.Independent(batched_normal, reinterpreted_batch_ndims=1)
independent_normal

<tfp.distributions.Independent 'IndependentNormal' batch_shape=[] event_shape=[2] dtype=float32>

The batched Gaussian distribution is now an IndependentNormal distribution object, which is an independent multivariate Gaussian as we defined above. We can see it by the event_shape of 2. Similarly, the log probability should yield a single value now.

<tf.Tensor: shape=(), dtype=float32, numpy=-2.5822742>

Finally, let’s compare the plot of the independent Gaussian distribution with the one that we shown above.

samples = independent_normal.sample(10000).numpy()
x1 = samples[:,0]
x2 = samples[:,1]
sns.jointplot(x = x1, y = x2, kind='kde', space=0, color='b', xlim=[-6, 7], ylim=[-6, 7]);
Figure 4: Plot of the approximate joint distribution of the independent Gaussian distribution object. Univariate plots are also shown for the individual dimensions.

Trainable Parameters

Variables

Now that we know what TensorFlow Probability objects are, it is time to understand how we can train parameters for these distributions. This is the connection that we are missing to start applying what we have learned and building algorithms.

In TensorFlow, Variable objects are what we use to capture the values of the parameters of our deep learning models. These objects are updated during training by, for example, applying gradients obtained from a loss function and data.

Let’s define one. Note that to create a new variable, we have to provide an initial value.

init_vals = tf.constant([[1.0, 2.0], [3.0, 4.0]])
new_variable = tf.Variable(init_vals)
new_variable

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[1., 2.],
[3., 4.]], dtype=float32)>

A Variable is very similar to a tensor. They have similar properties such as shape and dtype, and methods/operations, e.g. export to NumPy. They have some differences though, for example, they cannot be reshaped.

print("shape: ", new_variable.shape)
print("dType: ", new_variable.dtype)
print("as NumPy: ", new_variable.numpy())
print("Index of highest value:", tf.math.argmax(new_variable))

shape: (2, 2)
dType: <dtype: 'float32'>
as NumPy: [[1. 2.]
[3. 4.]]
Index of highest value: tf.Tensor([1 1], shape=(2,), dtype=int64)

Note that if for some reason you do not want a variable to be differentiated during training, you can define it with the argument trainable .

variable_not_diff = tf.Variable(1, trainable=False)

<tf.Variable 'Variable:0' shape=() dtype=int32, numpy=1>

Anyway, usually, we want our variables to be differentiable. TensorFlow allows automatic differentiation, which is the foundational piece to the backpropagation algorithm for training neural networks.

There is an API that we will use to accomplish the automatic differentiation — tf.GradientTape. Connecting back to the Variable object, this API gives us the ability to compute the gradient of an operation with respect to our inputs, i.e. one or more Variable objects.

Let’s do a quick example using tf.GradientTape API and the Variable object.

x = tf.Variable(3.0)

with tf.GradientTape() as tape:
y = x**2

Once we defined an operation inside the tf.GradientTape context, we can call the gradient method and pass the loss and the input variables.

dy_dx = tape.gradient(y, x)
dy_dx.numpy()

6.0

Time to apply these concepts to our problem. Recall that we are interested in learning the parameters of a distribution.

normal = tfd.Normal(loc=tf.Variable(0., name='loc'), scale=5)
normal.trainable_variables

(<tf.Variable 'loc:0' shape=() dtype=float32, numpy=0.0>,)

In this case, the mean of the Gaussian distribution defined above is no longer a simple value but a Variable object that can be learned.

For the training procedure, Maximum Likelihood is the usual suspect in deep learning models. In a nutshell, we are looking for the parameters of our model that maximize the probability of the data.

The PDF of a continuous random variable roughly indicates the probability of a sample taking a particular value. We will denote this function 𝑃(𝑥|𝜃) where 𝑥 is the value of the sample and 𝜃 is the parameter describing the probability distribution:

tfd.Normal(0, 1).prob(0)

<tf.Tensor: shape=(), dtype=float32, numpy=0.3989423>

It may seem fancy, but in fact, we have been computing the PDF of Gaussian distributions for a while now, so nothing particularly new here.

To finalize this introduction to training parameters, let’s connect this concept with the independent distribution objects that we also shared above. When more than one sample is drawn independently from the same distribution (which we usually assume), the PDF of the sample values 𝑥1,…,𝑥𝑛 is the product of the PDFs for each individual 𝑥𝑖. We can write it as:

Hopefully, you see how both concepts overlap in the above definition.

Conclusion

This article continued to explore distribution objects in TFP but this time connected it with the Variable object from TensforFlow. We started by defining what an independent distribution is and how can it help us define independent joint probabilities. It allows us to move from univariate distributions to an independent multivariate distribution, thus absorbing whatever dimensions we want to the event dimension. Next, we introduced the Variable objects and how can we differentiate them. With that knowledge, we used them in conjunction with a distribution object from TFP. Finally, we talked about the Maximum Likelihood procedure and how it relates to the independent joint distribution when we sample independently from the same distribution.

Next week, we will explore the training procedure for distributions in more detail. See you then!

References and Materials

[1] — Coursera: Deep Learning Specialization

[2] — Coursera: TensorFlow 2 for Deep Learning Specialization

[3] — TensorFlow Probability Guides and Tutorials

[4] — TensorFlow Probability Posts in TensorFlow Blog

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: