A comprehensive introduction to Tensorflow’s Sequential API and model for deep learning



Original Source Here

A comprehensive introduction to Tensorflow’s Sequential API and model for deep learning

Image by author

Tensorflow is the framework created by Google that allows machine learning practitioners to create deep learning models and is often the first solution that is proposed to analysts who approach deep learning for the first time.

The reason is to be found in the simplicity and intuitiveness of the Tensorflow sequential API — it allows the analyst to create very complex and powerful neural networks while remaining user-friendly.

API stands for Application Programming Interface and is the medium between the user and the application we want to use.

The Tensorflow interface is simple, direct and easily understandable even by those who have never done practical deep learning, but who only know the theory.

Tensorflow’s sequential API is very beginner-friendly and is recommended as a starting point in your deep learning journey.

In this article I propose an introduction to deep learning that takes advantage of the sequential API, showing examples and code to help the reader understand these concepts and serve as an in.

Intuition behind deep learning and neural networks

Before going into how Tensorflow and its sequential model work, it is good to have some background on what deep learning is and how neural networks work.

Neural networks are the primary tool used for deep learning. The learning of a phenomenon takes place through the activation and deactivation of the neurons present within a network — this activity allows a network to create a representation of the problem we want to solve through weights and biases.

Between each layer of neurons there is an activation function. This applies a transformation on the output of one that is fed into the next, with various consequences on the network’s ability to generalize the problem. The most common of the activation functions is called ReLU.

A neural network is able to understand whether or not it is improving its performance by comparing how close its predictions are to the real values. This behavior is described by a loss function. As machine learning practitioners, our goal is to find ways to minimize the loss.

As analysts, our goal is to reduce the loss as much as possible while avoiding overfitting

I won’t go into detail because I covered this topic in the article Introduction to Neural Networks — Weights, Bias and Activation. I recommend the reader who wants to expand the understanding of the basics of deep learning to read this article.

This bit of context, however, will be enough to help in understanding the following section.

What is the Tensorflow Sequential API and how does it work?

Tensorflow provides a plethora of features thanks to Keras. The TensorFlow APIs are based on those of Keras for defining and training neural networks.

The sequential model allows us to specify a neural network, precisely, sequential: from input to output, passing through a series of neural layers, one after the other.

Tensorflow also allows us to use the functional API for building deep learning models. This approach is used by more expert users and is not as user-friendly as the sequential API. The fundamental difference lies in the fact that the functional API allows us to create non-sequential neural networks (therefore multi-output and with integrations to other components).

In Python, a sequential model is coded in this way

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential(
[...]
)

keras.Sequential accepts a list containing the layers that define the architecture of the neural network. Data flows sequentially through each layer until it reaches the final output layer.

How data flows through a sequential model. Image by author.

We can specify neural layers via keras.layers.

The input layer

The first layer to place into a neural network is the input layer. Through keras it is very simple.

We create a layer of 256 neurons, with ReLU activation and an input size of 4.

# input layer
layers.Dense(256, input_dim=4, activation="relu", name="input")

This layer will be different from the others, as the others do not need the input_dim argument.

One of the most common problems in deep learning for beginners is to understand what the shape of the input is.

Finding the value of input_dim is not trivial, as it depends on the nature of the dataset we are working with. Since in deep learning we work with tensors (structures that contain multidimensional data), it sometimes becomes difficult to guess the form of our data in input to the neural network.

In a tabular dataset, our input shape will be equal to the number of columns in the dataset. With Pandas and Numpy just use .shape[-1 on the object to get this information.

In the case of images instead, we need to pass the total number of pixels of the image. In case of a 28 * 28 image for instance, the input dimension will be 784.

For time series, we need to pass the batch size, the time window and the features size.

Assuming that our dataset is tabular and has 4 columns, we just need to specify 4 as input dimension.

As for the number of neurons, the value of 256 is arbitrary. You have to experiment with these parameters and evaluate which is the most performing architecture.

The layers following the input

Let’s add layers to our sequential model.

model = keras.Sequential(
[
layers.Dense(256, input_dim=4, activation="relu", name="input")
layers.Dense(128, activation="relu", name="layer1"),
layers.Dense(64, activation="relu", name="layer2"),
# ...
]
)

Layers after the first one do not need to specify the input dimension, since after the input layer it will be the representation of the input in terms of weights and biases that will pass from one layer to another.

The output layer

The output layer differs from the other layers in that it must reflect the number of values we want to receive in output from the neural network.

For example, if we want to do a regression, and thus predict a single number, the number of units in the final layer must be one.

# output layer
layers.Dense(1, name="output")

In this case, we need not to specify the activation function, since without it the final representation of the data will be linear (not influenced by the ReLU), untransformed.

If instead we wanted to predict categories, for example to classify between cats and dogs in an image, we would need an activation function in the last layer called Softmax. Softmax maps the representation of the neural network to the classes present in our dataset, assigning a probability to the prediction of each class.

For the example mentioned, our output layer would be

# output layer
layers.Dense(1, activation="softmax", name="output")

The result would look like [[0.98, 0.02]] , where the first number indicates how confident the neural network is in predicting class 0, which could be dog or cat.

Print a summary of the model

Let’s put together the pieces of code we’ve seen so far, add a name to the model, and print a summary of our architecture with .summary().

model = keras.Sequential(
layers=[
layers.Dense(256, input_dim=4, activation="relu", name="input"),
layers.Dense(128, activation="relu", name="layer1"),
layers.Dense(64, activation="relu", name="layer2"),
layers.Dense(2, activation="softmax", name="output")
],
name="sequential_model1"
)
model.summary()

The result

A call of .summary() on our model. Image by author

This summary shows important information for understanding the architecture of our neural network and how data moves between layers.

The most important column is Output Shape. In the case of such a simple example it may not seem relevant, but this column shows how our data changes shape in the various layers of the neural network.

The summary becomes especially useful when we use convolutional neural networks or LSTMs. This is because the shape of the data changes in ways that are not easily understood. In case of errors and bugs, this information can help us debug the code.

The Param # column indicates the number of parameters that can be tuned by the neural network. In mathematical terms, it is the number of dimensions of our optimization problem. Recall that each neuron has a weight and a bias parameter and therefore n_parameters = n_neurons * (n_inputs + 1).

In the first layer the output and input are the same, so it would be 256 x 5.

Add layers incrementally

There is also an alternative, purely style-based and therefore arbitrary method of adding layers to a sequential model.

Incrementally, you can use model.add()to add an object to the model.

model = keras.Sequential()
model.add(layers.Dense(256, input_dim=4, activation="relu", name="input"))
model.add(layers.Dense(128, activation="relu", name="layer1"))
model.add(layers.Dense(64, activation="relu", name="layer2"))
model.add(layers.Dense(2, activation="softmax", name="output"))

The final result is the same as seen previously through the layers list, so you can use the approach you prefer.

Compile a sequential model

Now let’s compile the model — a necessary step for training the neural network.

To compile a model means to set up a loss function, an optimizer and performance evaluation metrics.

Once the network architecture is established, compiling requires only a small piece of code. Continuing the example of the classification between dogs and cats, we will use the categorical cross-entropy as the loss function, Adam as the optimizer and accuracy as the evaluation metric.

To read more about these parameters, I invite you to read the article on binary image classification done in Tensorflow.

model.compile(
loss="categorical_crossentropy",
optimizer="adam",
metrics=["accuracy"]
)

Train the sequential model

To train the sequential model just use model.fit() after compiling it. Just pass X and y, where X is our feature set and y is our target variable.

There are other passable parameters in .fit() as well. Here are some of the most important:

  • batch_size: allows you to set the number of examples to evaluate at each training iteration before updating the model weights and biases
  • epochs: establishes the number of times that the model processes the entire dataset. An epoch is processed when all examples in the dataset have been used to update model weights
  • validation_data: here we pass the test dataset on which to do the training evaluation.
model = keras.Sequential(
layers=[
layers.Dense(256, input_dim=4, activation="relu", name="input"),
layers.Dense(128, activation="relu", name="layer1"),
layers.Dense(64, activation="relu", name="layer2"),
layers.Dense(2, activation="softmax", name="output")
],
name="sequential_model1"
)
model.compile(
loss="categorical_crossentropy",
optimizer="adam",
metrics=["accuracy"]
)
history = model.fit(X_train, y_train, batch_size=32, epochs=200, validation_data=(X_test, y_test))

from here starts the training process which will show the progress with loss and performance metrics in the terminal.

Training of a neural network in Tensorflow. Image by author.

I wrote an article about Early Stopping with Tensorflow, a callback that can help a neural network improve its training performance.

Evaluation of the sequential model

The attentive reader will have noticed a small detail in the code snippet just above. I’m referring to history = model.fit(...). Why do you have to assign the training process to a variable? The reason is because model.fit(...) returns an object that contains the training performances.

In Tensorflow, using .fit(...) returns an object with model training performance. This object can be used to visualize these performances and to analyze them in detail.

We can access the values in the dictionary by exploring the history attribute within the variable.

Using this data we can visualize the training performance on the training and validation set.

def plot_model(metric):
plt.plot(history.history[metric])
plt.plot(history.history[f"val_{metric}"])
plt.title(f"model {metric}")
plt.ylabel(f"{metric}")
plt.xlabel("epoch")
plt.legend(["train", "validation"], loc="upper left")
plt.show()

plot_model("loss")
plot_model("accuracy")

Let’s check the loss

Loss curves. Image by author.

The same goes for the evaluation metric chosen, in this case accuracy

Accuracy curves. Image by author.

Question for the reader 🥸:

Why does the accuracy in the validation set increase, but the loss for the same set increases as well? It’s not a simple question — share your thoughts with a comment below

In case we have validation and test sets, we can evaluate the model using model.evaluate(X_test, y_test).

Making predictions with the sequential model

Once trained, it’s time to use the model to make predictions.

train_predictions = model.predict(X_train)

In this case, the API is similar to Sklearn’s and the neural network predictions are assigned to train_predictions.

Saving and loading a Tensorflow model

The last step is usually to save the model we have trained. Tensorflow’s API allows you to do this simply with

model.save("./path/to/file")

A folder will be created at the specified disk location that will contain our neural network files.

To load the model later, just do

model = keras.models.load_model("./path/to/file")

From here it is possible to use the model to make predictions as we saw earlier.

When NOT to use a sequential model?

As already mentioned, Tensorflow allows you to create non-sequential neural networks through the use of the functional API.

In particular, the functional approach is worth considering when:

  • we need multiple outputs, so a multi-output neural network
  • one layer requires more inputs from previous layers
  • two neural networks need to communicate with each other
  • we need a custom neural network, with an unusual architecture

In all of these and more of these cases, we require a non-sequential model. The sequential approach is generally very flexible as it allows us to solve many problems, such as binary image classification, but for more complex problems such an architecture could prove to be too simple.

Conclusions

The Tensorflow API and sequential model are powerful and easy-to-use tools for the deep learning practitioner.

This guide wants to put the deep learning beginner in a position to be able to experiment with these tools in his/her personal projects, avoiding feeling lost and resorting to the official documentation.

If you want to support my content creation activity, feel free to follow my referral link below and join Medium’s membership program. I will receive a portion of your investment and you’ll be able to access Medium’s plethora of articles on data science and more in a seamless way.

I hope I have contributed to your education. Until next time! 👋

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: