Improve the Performance Easily in TensorFlow Using Graph Mode

https://miro.medium.com/max/1200/0*2d72L83uHnZvYPd8

Original Source Here

Improve the Performance Easily in TensorFlow Using Graph Mode

Photo by Mathew Schwartz on Unsplash

Originally, TensorFlow only allowed you to code in Graph Mode, but since the ability to code in Eager Mode was introduced, most notebooks produced are in Eager Mode.

So much so that it’s hard to find code written in Graph Mode, and most TensorFlow programmers who are just starting out or have been creating models for a while have never programmed in Graph Mode.

The truth is that the code in Graph Mode is much more complicated to read and maintain, although it is also much more efficient. Luckily, TensorFlow offers us a method to go from Eager Mode to Graph Mode in a simple way: AutoGraph!

The best way to get an idea is to see the code of a basic function in Eager and Graph Mode.

#Eager mode. 
def func_eager(x):
if (x >0):
x = x + 1
return x

#Same function in graph mode.
def func_graph(x):
def if_true():
return x + 1
def if_false():
return x
x=tf.cond(tf.greater(x, 0), if_true, if_false)

This function is really simple: it just adds 1 to x if x is bigger than 0.

There is no doubt about the Eager function. The reading is straightforward for anyone who knows a bit of programming. But the code in the Graph function becomes very complicated, making it difficult to read and understand, not to mention write. Still, it is better code for the machine and is easier to execute in parallel, which will result in better performance.

These performance improvements gained with Graph mode code make it a very interesting technique for many notebooks. We have to take advantage of it.

In the next section, we will examine some examples of how AutoGraph can be used. Based on a Notebook, which can be found on Kaggle. In it, you can find the functions in Graph and in Eager Mode, and thus you can compare the performance between the two methods.

Testing the AutoGraph.

We will use the same function as above.

@tf.function
def func(x):
if x > 0:
x = x + 1
return x

The only modification has been to decorate the function with @tf.function. We don’t need anything else, but we’ll soon see why this isn’t always the case and why not all the code is 100% portable to Graph Mode.

Apart from knowing how to transform the code, it is also interesting to see the generated code. We can do it with a single line of code.

print (tf.autograph.to_code(func.python_function))

The code displayed is:

def tf__func(X):
with ag__.FunctionScope('func', 'fscope', ag__.ConversionOptions(recursive=True, user_requested=True, optional_features=(), internal_convert_user_code=True)) as fscope:
do_return = False
retval_ = ag__.UndefinedReturnValue()

def get_state():
return (x,)

def set_state(vars_):
nonlocal x
(x,) = vars_

def if_body():
nonlocal x
x = (ag__.ld(x) + 1)

def else_body():
nonlocal x
pass
x = ag__.Undefined('x')
ag__.if_stmt((ag__.ld(x) > 0), if_body, else_body, get_state, set_state, ('x',), 1)
try:
do_return = True
retval_ = ag__.ld(x)
except:
do_return = False
raise
return fscope.ret(retval_, do_return)

It is much more complicated than the one I have written of the same function, although if we look closely, it has the same structure.

All the code to be executed is contained in functions defined before the line containing the intelligence. The flow is controlled by the if_stmt function, which receives a condition and has the functions to execute in case the function is true or false. In our case, the condition is that x is greater than 0. If the condition is met, it will call if_body, and if the condition is not met, else_body.

Let’s look at some cases in which we have to change the code if we want to use it in Graph Mode.

For example, we cannot declare Tensor variables inside the function if we need them.

#This function will fail as it does not support declaring tf variables in the body. 
#if you want to execute and test it, remove the comments. </em>
#@tf.function
#def f(x):
# v = tf.Variable(1.0)
# return v.assign_add(x)

#For it to work, we just have to remove the variable and declare it outside the function
v = tf.Variable(1.0)
@tf.function
def f(x):
return v.assign_add(x)

The solution is as simple as declaring the variables outside the body of the function.

Another case to consider is how the print() function works. In Graph Mode, this function will only be executed once, regardless of whether it is inside a loop where it should be executed multiple times. The solution is as simple as replacing it with the tf.print() function.

@tf.function
def print_test():
tf.print("with tf.print")
print("with print")

for i <strong>in</strong> range(5):
print_test()

The result of this function would be:

with print
with tf.print
with tf.print
with tf.print
with tf.print
with tf.print

That is, the block would be executed five times, but only one call to print() would be made. Something similar happens with ASSERT, which must be replaced by the corresponding tf.debugging.assert.

These three cases are simply a sample, although they are the first ones that we usually come across when trying to pass the code from Eager to Graph Mode.

Comparing Eager with Graph Mode using a couple of Datasets.

In the notebook that can be found on Kaggle, you will find all the code. In this article, I will only show the parts that are directly related to the tests and the code in Graph Mode.

The Notebook is ready to work with a couple of very popular Datasets: Cats vs Dogs and Humans vs Horses. In Kaggle, I have used the Humans vs Horses dataset due to the memory limitations of the platform. You can try the Cats vs Dogs dataset if you download your notebook and run it on a machine with more memory.

The two Datasets are part of the TensorFlow database of Datasets. So, it’s much easier to use the notebook in any environment without having to worry about getting the data from the Dataset.

Since it will be easier to see the improvements obtained, I have used a custom model.

One area that can be accelerated is data processing. In this case, they are simple images with little treatment, but even so, a significant percentage improvement can be seen. In fields such as Natural Language Processing, where data processing can be very heavy, the improvements will be more significant. Depending on the size of the Dataset and the treatment that you want to perform.

#Treat the image in eager mode. 
def map_fn_eager(img, label):
# resize the image
img = tf.image.resize(img, size=[IMAGE_SIZE, IMAGE_SIZE])
# normalize the image
img /= 255.0
return img, label

#Treat the image in graph mode.
@tf.function
def map_fn_graph(img, label):
# resize the image
img = tf.image.resize(img, size=[IMAGE_SIZE, IMAGE_SIZE])
# normalize the image
img /= 255.0
return img, label

# Prepare train dataset by using preprocessing with map_fn_eager or graph, shuffling and batching
def prepare_dataset(train_examples, validation_examples, test_examples, num_examples, map_fn, batch_size):
train_ds = train_examples.map(map_fn).shuffle(buffer_size = num_examples).batch(batch_size)
valid_ds = validation_examples.map(map_fn).batch(batch_size)
test_ds = test_examples.map(map_fn).batch(batch_size)

return train_ds, valid_ds, test_ds

As you can see, the code of the two functions, map_fn_eager and map_fn_graph, is the same, only the decoration with @tf.function varies.

There is a third function, prepare_dataset, which will be the function to call to prepare the Datasets. It will receive as a parameter the function to execute. When we call this function, we will indicate if we want to execute the function prepared to work in Graph mode or in Eager mode.

start_time = time.time()
train_ds_eager, valid_ds_eager, test_ds_eager = prepare_dataset(train_examples,
validation_examples,
test_examples,
num_examples,
map_fn_eager, BATCH_SIZE)
end_time = time.time()
print ("Eager Time spend:", end_time - start_time)

start_time = time.time()
train_ds_graph, valid_ds_graph, test_ds_graph = prepare_dataset(train_examples,
validation_examples,
test_examples,
num_examples,
map_fn_graph, BATCH_SIZE)
end_time = time.time()
print ("GraphTime spend:", end_time - start_time)
Eager Time sped: 0.061063528060913086
Graph Time spend: 0.0561823844909668

As you can see, it is a very fast process, it is normal because the Dataset is small and simple. Even so, we only needed one line to get a 15% performance improvement. Not bad at all. The most important improvement will be seen in the execution of the model.

The notebook used in Kaggle comes ready to work with three models. Two of them were obtained from TensorFlow’s HUB: A resnet_50 and resnet_v2_152. But we will see the results of a third model, much simpler, and that is a model built with TensorFlow’s sequential API. If you can run it on a more powerful machine, do not hesitate to try the other two models.

#MODULE_HANDLE = 'https://tfhub.dev/tensorflow/resnet_50/feature_vector/1'
#MODULE_HANDLE = 'https://tfhub.dev/google/imagenet/resnet_v2_152/classification/5'
#model = tf.keras.Sequential([
# hub.KerasLayer(MODULE_HANDLE, input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3)),
# tf.keras.layers.Dense(num_classes, activation='softmax')
#])

model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(16, (4,4), activation="relu", input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3)),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Conv2D(32, (4,4), activation="relu"),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Conv2D(64, (4,4), activation="relu"),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation="relu"),
tf.keras.layers.Dense(2, activation="softmax")])
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 221, 221, 16) 784
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 110, 110, 16) 0
_________________________________________________________________
dropout (Dropout) (None, 110, 110, 16) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 107, 107, 32) 8224
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 53, 53, 32) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 53, 53, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 50, 50, 64) 32832
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 25, 25, 64) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 25, 25, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 40000) 0
_________________________________________________________________
dense (Dense) (None, 512) 20480512
_________________________________________________________________
dense_1 (Dense) (None, 2) 1026
=================================================================
Total params: 20,523,378
Trainable params: 20,523,378
Non-trainable params: 0
_________________________________________________________________

Attention, the model will be a custom model. This means that I am going to write the function that will be executed in each step and the one that will control the training of the epochs. This second function will be the one executed in Eager and Graph Mode.

# Custom training step. This function is executed in each step each epoch. 
def train_one_step(model, optimizer, x, y, train_loss, train_accuracy):
with tf.GradientTape() as tape:
<em># Run the model on input x to get predictions</em>
predictions = model(x)
<em># Compute the training loss using `train_loss`, passing in the true y and the predicted y</em>
loss = train_loss(y, predictions)

# Using the tape and loss, compute the gradients on model variables using tape.gradient
grads = tape.gradient(loss, model.trainable_variables)

# Zip the gradients and model variables, and then apply the result on the optimizer
optimizer.apply_gradients(zip(grads, model.trainable_variables))

# Call the train accuracy object on ground truth and predictions
train_accuracy(y, predictions)
return loss

The above function will be executed in each step. As you can see, it has no secret. It makes a prediction, recovers the loss, and calculates the gradients that are passed to the optimizer, which decides how to modify the weights.

def train_eager(model, optimizer, epochs, train_ds, train_loss, train_accuracy, valid_ds, val_loss, val_accuracy):
step = 0
loss = 0.0
for epoch <strong>in</strong> range(epochs):
for x, y <strong>in</strong> train_ds:
# training step number increments at each iteration
step += 1

# Run one training step by passing appropriate model parameters
# required by the function and finally get the loss to report the results
loss = train_one_step(model, optimizer, x, y, train_loss, train_accuracy)

# Use tf.print to report your results.</em>
# Print the training step number, loss and accuracy
print('Step', step,
': train loss', loss,
'; train accuracy', train_accuracy.result())

for x, y <strong>in</strong> valid_ds:
# Call the model on the batches of inputs x and get the predictions
y_pred = model(x)
loss = val_loss(y, y_pred)
val_accuracy(y, y_pred)
# Print the validation loss and accuracy

tf.print('val loss', loss, '; val accuracy', val_accuracy.result())

@tf.function
def train_graph(model, optimizer, epochs, train_ds, train_loss, train_accuracy, valid_ds, val_loss, val_accuracy):
step = 0
loss = 0.0
for epoch in range(epochs):
for x, y in train_ds:
# training step number increments at each iteration
step += 1

# Run one training step by passing appropriate model parameters
# required by the function and finally get the loss to report the results
loss = train_one_step(model, optimizer, x, y, train_loss, train_accuracy)

# Use tf.print to report your results.
# Print the training step number, loss and accuracy
tf.print('Step', step,
': train loss', loss,
'; train accuracy', train_accuracy.result())

for x, y in valid_ds:
# Call the model on the batches of inputs x and get the predictions
y_pred = model(x)
loss = val_loss(y, y_pred)
val_accuracy(y, y_pred)

# Print the validation loss and accuracy
tf.print('val loss', loss, '; val accuracy', val_accuracy.result())

As you can see, the code of the two functions is almost the same. The only modification needed was to replace the print() functions with tf.print() functions. These are the functions that call train_eager at each step and do so during the indicated epochs.

Let’s see the results of executing the function in Eager Mode:

#Solving the model in eager mode, and printing the time elapsed.
st = time.time()
train_eager(model, optimizer, 6, train_ds_eager,
train_loss, train_accuracy, valid_ds_eager,
val_loss, val_accuracy)
et = time.time()
print('Eager mode spent time: ' et - st)
......
Step 137 : train loss tf.Tensor(0.00049685716, shape=(), dtype=float32) ; train accuracy tf.Tensor(0.9509188, shape=(), dtype=float32)
Step 138 : train loss tf.Tensor(0.0001548939, shape=(), dtype=float32) ; train accuracy tf.Tensor(0.9510895, shape=(), dtype=float32)
val loss 4.05896135e-05 ; val accuracy 0.98780489
Eager mode spent time: 21.57599711418152

Graph Mode:

#Solving the model in graph mode, and printing the time elapsed.
st = time.time()
train_graph(model, optimizer, 6, train_ds_graph,
train_loss, train_accuracy, valid_ds_graph,
val_loss, val_accuracy)
et = time.time()
print('Graph mode spent time: ' et - st)
.............
Step 137 : train loss 0.000154053108 ; train accuracy 0.975502133
Step 138 : train loss 3.81469249e-07 ; train accuracy 0.975544751
val loss 2.89767604e-06 ; val accuracy 0.993902445
11.941826820373535

As you have observed, the performance improvement in the execution of the model is considerable. The improvement is around 50%, going from 21 to 12 seconds.

You are aware that these are somewhat inaccurate numbers. We are using toy datasets.

Imagine the savings capacity you can have in a large Dataset. Besides the fact that the improvement is not only achieved in the training time, it can also be seen in the inference time.

Conclusions.

Honestly, I think it’s imperative to make our models support graphics mode. They don’t have to be native, but they should support AutoGraph.

The improvement obtained in performance can be very considerable, our model has been trained 50% faster in Graph Mode than it has done in Eager Mode. If we think about how easy it would be to change the code to make this improvement, it’s clear that we would have to try to change all our notebooks.

We see that not only do we find an improvement in performance, but there is also a difference in the accuracy value obtained. I am unable to explain why. But it is something that we have to control, if the metrics obtained are not the same, we must consider re-evaluating the model every time we pass a function from Eager to Graph Mode or vice versa.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: