TensorFlow Callbacks — How to Monitor Neural Network Training Like a Pro



Original Source Here

Declaring callbacks with TensorFlow

If you’ve read my previous article on optimizing the learning rate with TensorFlow, you already know how callbacks work. Basically, you’ll include them in the fit() function. There’s no one stopping you from declaring a list of callbacks beforehand, just to keep the training function extra clean.

TensorFlow has a bunch of callbacks built-in. You can also write custom callback functions, but that’s a topic for another time. I use only four built-in callbacks for most projects.

ModelCheckpoint

You can use this one to save the model locally on the current epoch if it beats the performance obtained on the previous one. The performance with any metric you want, such as loss, or accuracy. I recommend monitoring the performance on the validation set, as deep learning models tend to overfit the training data.

You can save the model either as a checkpoint folder or as an hdf5 file. I recommend the latter, as it looks much cleaner on your file system. Also, you can specify a much nicer file path that contains the epoch number and the value of the evaluation metric at that epoch.

Here’s how to declare ModelCheckpoint callback:

In a nutshell, it will save the model on the current epoch only if it outperforms the one at the previous epoch, regarding the accuracy on the validation set.

ReduceLROnPlateau

If a value of the evaluation metric doesn’t change for several epochs, ReduceLROnPlateau reduces the learning rate. For example, if validation loss didn’t decrease for 10 epochs, this callback tells TensorFlow to reduce the learning rate.

The new learning rate is calculated as the old learning rate multiplied by a user-defined factor. So, if the old learning rate is 0.01, and the factor is 0.1, the new learning rate is 0.01 * 0.1 = 0.001.

Here’s how to declare it:

To summarize, the above declaration instructs TensorFlow to reduce the learning rate by a factor of 0.1 if the validation loss didn’t decrease in the last 10 epochs. The learning rate will never go below 0.00001.

EarlyStopping

If a metric doesn’t change by a minimum delta in a given number of epochs, the EarlyStopping callback kills the training process. For example, if validation accuracy doesn’t increase at least 0.001 in 10 epochs, this callback tells TensorFlow to stop the training.

Here’s how to declare it:

There’s not much to it — it’s simple but extremely useful.

CSVLogger

The CSVLogger callback captures model training history and dumps it into a CSV file. It’s useful for analyzing the performance later, and comparing multiple models. It saves data for all the metrics you’re tracking, such as loss, accuracy, precision, recall — both for training and validation sets.

Here’s how to declare it:

Easy, right? Definitely, but the best is yet to come. Let’s train the model with these callbacks next.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: