https://miro.medium.com/max/1200/0*xDRQ4cTPXeoIHmeR

Original Source Here

# 📦 Data

We will use the MNIST dataset of handwritten digits, one of well-known introductory image datasets. The data is available under the Creative Commons Attribution-Share Alike 3.0 licence. We will load the necessary libraries and data:

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_splitimport tensorflow as tf

from tensorflow.keras.datasets import mnist

from tensorflow.keras import Sequential

from tensorflow.keras.layers import (Flatten, Dense,

Conv2D, MaxPooling2D)import matplotlib.pyplot as plt

import seaborn as sns

sns.set(style='darkgrid', context='talk')(train_data, train_labels), (test_data, test_labels) = mnist.load_data()

train_data, valid_data, train_labels, valid_labels = train_test_split(

train_data, train_labels, test_size=10000, random_state=42

)

print("========== Training data ==========")

print(f"Data shape: {train_data.shape}")

print(f"Label shape: {train_labels.shape}")

print(f"Unique labels: {np.unique(train_labels)}")print("\n========== Validation data ==========")

print(f"Data shape: {valid_data.shape}")

print(f"Label shape: {valid_labels.shape}")

print(f"Unique labels: {np.unique(valid_labels)}")print("\n========== Test data ==========")

print(f"Data shape: {test_data.shape}")

print(f"Label shape: {test_labels.shape}")

print(f"Unique labels: {np.unique(test_labels)}")

We have 50K training, 10K validation and 10K test 28-by-28-pixel images. As expected, there are 10 classes of digits. Let’s now check the class distribution for each partitioned datasets:

`n_classes = len(np.unique(train_labels))`

(pd.concat([pd.Series(train_labels).value_counts(normalize=True)

.sort_index(),

pd.Series(valid_labels).value_counts(normalize=True)

.sort_index(),

pd.Series(test_labels).value_counts(normalize=True)

.sort_index()],

keys=['train', 'valid', 'test'], axis=1)

.style.background_gradient('YlGn', axis='index').format("{:.2%}"))

The class distribution is quite balanced across datasets. If you want to learn how to prettify your pandas DataFrame like this, you may find this post useful.

Before we start building image classification models, let’s explore the data by inspecting a few sample images:

`def inspect_sample_images(data, labels, title, n_rows=2, n_cols=3, `

seed=42):

np.random.seed(seed)

indices = np.random.choice(range(len(data)), n_rows*n_cols,

replace=False)

plt.figure(figsize=(8,5))

for i, ind in enumerate(indices):

ax = plt.subplot(n_rows, n_cols, i+1)

plt.imshow(data[ind], cmap='binary')

plt.axis('off')

plt.title(f"Label: {labels[ind]}", fontsize=14)

plt.suptitle(title, fontsize=20)

plt.tight_layout();

inspect_sample_images(train_data, train_labels, 'Sample training images')

We can see that the images reflect different handwriting.

`inspect_sample_images(valid_data, valid_labels, 'Sample validation images')`

Digit 8 at the bottom left is cut out slightly. Perhaps some images are possibly cropped.

`inspect_sample_images(test_data, test_labels, 'Sample test images')`

There are a couple of 2s in this sample and they each have their own style.

# 🔨 Modelling

This is the exciting part! Since the model building process is very experimental and iterative, we will iteratively build a couple of models.

## 🔧 Model 0

Currently, the labels are in 1D array format. We need to one-hot-encode our labels like this:

Now, let’s build our first simple neural network. We will set seed for reproducibility.

train_labels_ohe = tf.one_hot(train_labels, 10)

valid_labels_ohe = tf.one_hot(valid_labels, 10)

test_labels_ohe = tf.one_hot(test_labels, 10)tf.random.set_seed(42)

model_0 = Sequential([

Flatten(input_shape=(28, 28)),

Dense(16, activation="relu"),

Dense(16, activation="relu"),

Dense(n_classes, activation="softmax")

])model_0.compile(loss="categorical_crossentropy", optimizer='Adam',

metrics=["accuracy"])

model_0.summary()

Here we first defined the architecture of the neural network, then compiled it and printed its summary. Let’s look more closely.

◼️ **Defined the architecture of the neural network**In the first layer (

`flatten`

), it flattens the images from (28, 28) 2D array to (784) 1D array. Then, we have two fully connected hidden layers (`dense`

& `dense_1`

). For these layers, we used ReLu activation function. This is followed by the output layer with `softmax`

activation function (`dense_2`

) that has the same number of units as the number of classes.**◼ Compiled the model**We used

`categorical_crossentropy`

loss function. This loss function and softmax activation function in the output layer allows us to get probabilities for each class since we are building a multi-class classification model. We have used `Adam`

optimiser.**◼ Printed the model summary**Once compiled, we can see model’s layers as well as the number of parameters from the summary.

Now, it’s time to train the network:

`hist_0 = model_0.fit(train_data, train_labels_ohe, epochs=5, `

validation_data=(valid_data, valid_labels_ohe))

We will do only 5 epochs for faster training. This means the network will go through the data 5 times. From the summary above, we see that the accuracy is improving with each epoch. Let’s visualise the accuracy over epochs:

def clean_history(hist):

epochs = len(hist.history['accuracy'])

df = pd.DataFrame(

{'epochs': np.tile(np.arange(epochs), 2),

'accuracy': hist.history['accuracy'] +

hist.history['val_accuracy'],

'loss': hist.history['loss'] +

hist.history['val_loss'],

'dataset': np.repeat(['train', 'valid'], epochs)}

)

return dfsns.lineplot(data=clean_history(hist_0), x='epochs', y='accuracy',

hue='dataset');

We have created a function since this will be useful for evaluating subsequent models. We will continue to build functions for other evaluation methods. Let’s evaluate the model’s performance on unseen test data:

`test_preds_0 = model_0.predict(test_data)`

test_classes_0 = test_preds_0.argmax(axis=1)

test_metrics = pd.DataFrame(columns=['Test accuracy'])

test_metrics.loc['model_0'] = np.mean(test_labels==test_classes_0)

test_metrics

Cool, we will add performance of subsequent models to this DataFrame so that we can see all at once. `test_preds_0`

consists of (10000, 10) 2D array containing predicted probabilities by class for each record. We then assigned the class with the highest probability for each record and saved it into `test_classes_0`

. Now, let’s look at confusion matrix:

def show_confusion_matrix(labels, classes):

cm = (pd.crosstab(pd.Series(labels, name='actual'),

pd.Series(classes, name='predicted'))

.style.background_gradient('binary'))

return cmshow_confusion_matrix(test_labels, test_classes_0)

It’s great to see that the majority of the records are along the diagonal stretching from top left to bottom right. Interestingly, the current model mistakes 8s with 2s quite often.

Let’s inspect a few example images with their predictions:

`def inspect_sample_predictions(data, labels, preds, dataset='test', `

seed=42, n_rows=2, n_cols=3):

np.random.seed(seed)

indices = np.random.choice(range(len(data)), n_rows*n_cols,

replace=False)

plt.figure(figsize=(8,5))

for i, ind in enumerate(indices):

ax = plt.subplot(n_rows, n_cols, i+1)

plt.imshow(data[ind], cmap='binary')

plt.axis('off')

proba = preds[ind].max()

pred = preds[ind].argmax()

if pred == labels[ind]:

colour = 'green'

else:

colour = 'red'

plt.title(f"Prediction: {pred} ({proba:.1%})", fontsize=14,

color=colour)

plt.suptitle(f'Sample {dataset} images with prediction',

fontsize=20)

plt.tight_layout();

inspect_sample_predictions(test_data, test_labels, test_preds_0)

We will now look at the most incorrect predictions (i.e. incorrect predictions with the highest probability):

`def see_most_incorrect(data, labels, preds, dataset='test', seed=42, `

n_rows=2, n_cols=3):

df = pd.DataFrame()

df['true_class'] = labels

df['pred_class'] = preds.argmax(axis=1)

df['proba'] = preds.max(axis=1)

incorrect_df = df.query("true_class!=pred_class")\

.nlargest(n_rows*n_cols, 'proba')

plt.figure(figsize=(8,5))

for i, (ind, row) in enumerate(incorrect_df.iterrows()):

ax = plt.subplot(n_rows, n_cols, i+1)

plt.imshow(data[ind], cmap='binary')

plt.axis('off')

true = int(row['true_class'])

proba = row['proba']

pred = int(row['pred_class'])

plt.title(f"Actual: {true} \nPrediction: {pred} ({proba:.1%})",

fontsize=14, color='red')

plt.suptitle(f'Most incorrect {dataset} images', fontsize=20)

plt.tight_layout();

see_most_incorrect(test_data, test_labels, test_preds_0)

As we are rounding to one decimal when printing the probabilities, 100.0% here most likely represents probabilities like 99.95..% or 99.99..%. This gives us a glimpse of the kind of images that the model is confidently getting wrong. Even for humans, the first and last images are bit tricky to identify as 6.

Let’s see if we can improve the model.

## 🔧 Model 1

Neural networks tend to work well with data that’s squished between 0 and 1. So we will rescale the data to this range using the following formula:

Since the pixel values range between 0 (min) and 255 (max), we just need to divide values by 255 to scale. Besides rescaling, we will keep everything else the same as before. It’s good practice to change one thing at a time and understand its impact:

train_data_norm = train_data/255

valid_data_norm = valid_data/255

test_data_norm = test_data/255tf.random.set_seed(42)

model_1 = Sequential([

Flatten(input_shape=(28, 28)),

Dense(16, activation="relu"),

Dense(16, activation="relu"),

Dense(n_classes, activation="softmax")

])model_1.compile(loss="categorical_crossentropy", optimizer='Adam',

metrics=["accuracy"])

model_1.summary()

Let’s train the compiled model on the rescaled data:

`hist_1 = model_1.fit(`

train_data_norm, train_labels_ohe, epochs=5,

validation_data=(valid_data_norm, valid_labels_ohe)

)

With a simple preprocessing step, the performance is much better than before! Now, let’s visualise the performance over epochs:

`sns.lineplot(data=clean_history(hist_1), x='epochs', y='accuracy', `

hue='dataset');

Time to evaluate the model on test data and add it to our `test_metrics`

DataFrame.

`test_preds_1 = model_1.predict(test_data_norm)`

test_classes_1 = test_preds_1.argmax(axis=1)

test_metrics.loc['model_1'] = np.mean(test_labels==test_classes_1)

test_metrics

We improved the model’s predictive power quite substantially with one simple change. Let’s look at performance more closely with a confusion matrix:

`show_confusion_matrix(test_labels, test_classes_1)`

Now, 8s are no longer frequently confused with 2s. The most common mistake now is confusing 4s with 9s. This is not too surprising as they do look similar among some handwritings.

`inspect_sample_predictions(test_data_norm, test_labels, `

test_preds_1)

As we used the same seed to pull random images, we are looking at the same subset of images as before. We can see that some of previously incorrectly predicted images are now correctly predicted. It’s good to see that correct images have high probability whereas the incorrect one has lower probability.

`see_most_incorrect(test_data_norm, test_labels, test_preds_1)`

We can see that there’s still room for improvement.

Let’s see if we can improve the model.

## 🔧 Model 2

We will use `model_1`

as the base and increase the number of units from 16 to 64 in the hidden layers:

tf.random.set_seed(42)model_2 = Sequential([

Flatten(input_shape=(28, 28)),

Dense(64, activation="relu"),

Dense(64, activation="relu"),

Dense(n_classes, activation="softmax")

])model_2.compile(loss="categorical_crossentropy", optimizer='Adam',

metrics=["accuracy"])model_2.summary()

We have a lot more parameters now since we have increased the number of units.

`hist_2 = model_2.fit(`

train_data_norm, train_labels_ohe, epochs=5,

validation_data=(valid_data_norm, valid_labels_ohe)

)

The model performance looks slightly better than before.

`sns.lineplot(data=clean_history(hist_2), x='epochs', y='accuracy', `

hue='dataset');

In the last two epochs, the model is slightly overfitting. Let’s evaluate the model on the test data:

`test_preds_2 = model_2.predict(test_data_norm)`

test_classes_2 = test_preds_2.argmax(axis=1)

test_metrics.loc['model_2'] = np.mean(test_labels==test_classes_2)

test_metrics

Awesome, it’s great to see we are still seeing improvements to the model.

`show_confusion_matrix(test_labels, test_classes_2)`

As the model is more accurate, the confusion matrix looks more concentrated along the diagonal with mostly light grey to white cells in the rest of the cells.

`inspect_sample_predictions(test_data_norm, test_labels, `

test_preds_2)

Now, the model gets all the sample images right!

`see_most_incorrect(test_data_norm, test_labels, test_preds_2)`

The middle image at the top looks tricky while the rest of the images are relatively easier for humans to recognise.

Let’s see if we can improve the model one last time.

## 🔧 Model 3

Convolutional Neural Network (CNN) works well with image data. So let’s now apply a simple CNN to our data.

model_3 = Sequential([

Conv2D(32, 5, padding='same', activation='relu',

input_shape=(28,28,1)),

Conv2D(32, 5, padding='same', activation='relu'),

MaxPooling2D(),

Conv2D(32, 5, padding='same', activation='relu'),

Conv2D(32, 5, padding='same', activation='relu'),

MaxPooling2D(),

Flatten(),

Dense(128, activation='relu'),

Dense(n_classes, activation="softmax")

])model_3.compile(loss="categorical_crossentropy", optimizer='Adam',

metrics=["accuracy"])

model_3.summary()

We have a lot more parameters now. Let’s train the model:

`hist_3 = model_3.fit(`

train_data_norm, train_labels_ohe, epochs=5,

validation_data=(valid_data_norm, valid_labels_ohe)

)

Awesome, there seem to be slight improvement to performance!

`sns.lineplot(data=clean_history(hist_3), x='epochs', y='accuracy', `

hue='dataset');

We can see that the model is very slightly overfitting.

`test_preds_3 = model_3.predict(test_data_norm)`

test_classes_3 = test_preds_3.argmax(axis=1)

test_metrics.loc['model_3'] = np.mean(test_labels==test_classes_3)

test_metrics

Woah, we have reached 99% accuracy! ✨

`show_confusion_matrix(test_labels, test_classes_3)`

This is the best looking confusion matrix so far. We see many zeros off the diagonal and 1000+ along the diagonal for some digits.

`inspect_sample_predictions(test_data_norm, test_labels, `

test_preds_3)

Like before, sample images are predicted correctly.

`see_most_incorrect(test_data_norm, test_labels, test_preds_3)`

Some of these images are a bit tricky, especially 6s and 7. One of these 6s seem to have also been identified as one of the most incorrect predictions by `model_2`

.

We will end our model experimentation here in the interest of time. In practice, it’s almost certain that we will have to a whole lot more iterations before we settle to a model. In this post, we saw that each iteration improved our model predictive power. However, this is not always true in practice as some experiment ideas don’t work out. This is normal and just a nature of experimental approach.

While we explored only a couple of ideas as part of the four models, there are myriad of ways we could branch out the experimentation to improve the model. Here’re some ways one can try in an attempt to improve the model:

◼️ Increase layers

◼️ Change activation functions

◼️ Train for longer (i.e. more epochs)

If you are working on well-known datasets, one way to get inspirations for your neural network design is to look at leading model architectures. For instance, we can see that leading models on MNIST dataset from Papers with code. At the time of writing this post, Homogeneous ensemble with Simple CNN is leading it with 99.91 accuracy. If you are curious, you can learn more about the model and its architecture from the paper. In addition, there is usually an accompanying code to dig further. Since most of leading models tend to have very strong performance, your options are not just limited to the top one model.

# 🔐 Saving the model

Once we are happy with a model, there’s a convenient and intuitive way to save our model:

`model_3.save('model_3')`

By saving the model, we can load the model next time and straight away use it to make predictions without having to build from scratch. The loaded model’s performance will be exactly same as the model we just trained.

`loaded_model_3 = tf.keras.models.load_model('model_3')`

print(f"Test accuracy: {np.mean(loaded_model_3.predict(test_data_norm).argmax(axis=1)==test_labels):.1%}")

That was it for this post! Hope this post has given a brief introduction to building a basic image classification model using Tensorflow and how to iteratively improve the results. Having done basic image classification, we will build up our experience by looking at more realistic images in the part 2 of the series.

*Would you like to access more content like this? Medium members get unlimited access to any articles on Medium. If you become a member using **my referral link*,* a portion of your membership fee will directly go to support me.*

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot