Create Image Classification Models With Tensorflow in 10 minutes*6IJmFaAt-E7BLIhe

Original Source Here

Load Data

The dataset contains 60,000​ grayscale images in the training set and ​10,000​ images in the test set. Each image represents a fashion item that belongs to one of the 10 categories. An example is shown in Figure 1:

fashion_mnist = tf.keras.datasets.fashion_mnist(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()class_names={ 0: 'T-shirt/top',
1: 'Trouser',
2: 'Pullover',
3: 'Dress',
4: 'Coat',
5: 'Sandal',
6: 'Shirt',
7: 'Sneaker',
8: 'Bag',
9: 'Ankle boot' }
for i in range(25):
Figure 1: A sample of images from the dataset

Our goal is to build a model that correctly predicts the label/class of each image. Hence, we have a multi-class, classification problem.

Train/validation/test split

We already have training and test datasets. We keep 5% of the training dataset, which we call validation dataset. This is used for hyperparameter optimization.

train_x, val_x, train_y, val_y = train_test_split(train_images, train_labels, stratify=train_labels, random_state=48, test_size=0.05)(test_x, test_y)=(test_images, test_labels)

Pixel Rescaling

Since images are grayscale, all values are in the range of 0–255. We divide by 255 so that the pixel values lie between 0 and 1. This is a form of normalization, and will speed up our training process later.

# normalize to range 0-1
train_x = train_x / 255.0
val_x = val_x / 255.0
test_x = test_x / 255.0

One-Hot Encoding of Target values

Each label belongs to one of the 10 categories that we saw above. Thus the target(y) takes values between 0 and 9. For example, according to the dictionary class_names, ‘0’ is the class for ‘T-shirt/top’.

Let’s see the target values of the first 5 clothes from the training set:

-> array([[2], (Pullover)
[8], (Bag)
[6], (Shirt)
[1], (Trouser)
[.3]], dtype=uint8). (Dress)

Then, we one-hot encode them — each target value is assigned to a vector. This process is done for all y datasets (train, validation, test). We use the to_categorical() function:

train_y = to_categorical(train_y)
val_y = to_categorical(val_y)
test_y = to_categorical(test_y)

Hence, the target values of the first 5 clothes become:

array([[0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],        
[0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.]], dtype=float32)


Each dataset is stored as a Numpy array. Let’s check their dimensions:

print(train_x.shape)  #(57000, 28, 28)
print(train_y.shape) #(57000, 10)
print(val_x.shape) #(3000, 28, 28)
print(val_y.shape) #(3000, 10)
print(test_x.shape) #(10000, 28, 28)
print(test_y.shape) #(10000, 10)

Training the classification models

Now, everything is ready to build our models: We will use 2 types of neural networks: The classic Multilayered Perceptron (MLP) and the Convolutional Neural network (CNN).

Multilayered Perceptron (MLP)

The standard Neural Network architecture, is shown in Figure 1. An MLP with at least a hidden layer and a non-linear activation behaves like a universal continuous function approximator. Like all Neural Networks, they are based on the Stone–Weierstrass theorem:

Every continuous function defined on a closed interval [a, b] can be uniformly approximated as closely as desired by a polynomial function.

And of course, each layer of a Neural Network projects a polynomial representation of the input on a different space.

Figure 1: An Multilayer perceptron with one hidden layer [2]

Next, let’s define our model using the Keras API from Tensorflow.

model_mlp = Sequential()
model_mlp.add(Flatten(input_shape=(28, 28)))
model_mlp.add(Dense(350, activation='relu'))
model_mlp.add(Dense(10, activation='softmax'))
model_mlp.compile(optimizer="adam",loss='categorical_crossentropy', metrics=['accuracy'])

What you should know here:

  • Network Structure: Each image is 28x28 pixels. The first layer flattens the input into a 28*28=784 sized vector. Then, we add a hidden layer with 350 neurons. The final layer has 10 neurons, one for class in our dataset.
  • Activation function: The hidden layer uses the standard RELU activation. The final layer uses the softmax activation because we have a multi-class problem.
  • Loss function: The objective that our model will try to minimize. Since we have a multi-class problem, we use the categorical_crossentropy loss.
  • Metrics: During training, we monitor the accuracy: That is, the percentage of instances that we correctly classify.
  • Epochs: The number of times the model works through the entire dataset during training.

This is the structure of our network:

Layer (type) Output Shape Param #
flatten (Flatten) (None, 784) 0

dense (Dense) (None, 350) 274750

dense_1 (Dense) (None, 10) 3510
Total params: 278,260
Trainable params: 278,260
Non-trainable params: 0

Even though this is a beginner-friendly tutorial, there is one important feature that you should know:

Neural networks are prone to overfitting: It is possible to learn from the training data so well, that they may fail to generalize on new(test) data.

If we let the network train indefinitely, overfitting will happen eventually. And since we can’t know for sure how long a neural network needs before starts overfitting, we use a mechanism called EarlyStopping.

EarlyStopping monitors validation loss during training. If validation loss stops decreasing for a specified amount of epochs (called patience), the training immediately halts. Let’s use it in our implementation:

early_stop=EarlyStopping(monitor='val_loss', restore_best_weights= True, patience=5, verbose=1)callback = [early_stop]

Finally, we train our model:

history_mlp =, train_y, epochs=100, batch_size=32, validation_data=(val_x, val_y), callbacks=callback)
Figure 2: Training history of MLP model

We used a ridiculous amount of epochs (100) for this simple dataset to demonstrate that EarlyStopping was activated at epoch 10 and restored the best weights (specifically, at epoch 5).

Now, the most important metrics here are loss and accuracy: Let’s visualize them in a plot. We define the plot_history function:

# define the function:
def plot_history(hs, epochs, metric):
plt.rcParams['font.size'] = 16
plt.figure(figsize=(10, 8))
for label in hs:
plt.plot(hs[label].history[metric], label='{0:s} train {1:s}'.format(label, metric), linewidth=2)
plt.plot(hs[label].history['val_{0:s}'.format(metric)], label='{0:s} validation {1:s}'.format(label, metric), linewidth=2)
plt.ylim((0, 1))
plt.ylabel('Loss' if metric=='loss' else 'Accuracy')
plot_history(hs={'MLP': history_mlp}, epochs=15, metric='loss')
plot_history( hs={'MLP': history_mlp}, epochs=15, metric='accuracy')
Figure 3: Training and validation losses of MLP
Figure 4: Training and validation accuracies of MLP

Both figures show that the metrics were improved: Loss was decreased while accuracy increased.

Had the model been trained for more epochs, training loss would continue to decrease, while validation loss would remain constant (or even worse, increase). This would lead the model to become overfitted.

Finally, let’s check the accuracy of training, validation, and test sets:

mlp_train_loss, mlp_train_acc = model_mlp.evaluate(train_x,  train_y, verbose=0)
print('\nTrain accuracy:', np.round(mlp_train_acc,3))
mlp_val_loss, mlp_val_acc = model_mlp.evaluate(val_x, val_y, verbose=0)
print('\nValidation accuracy:', np.round(mlp_val_acc,3))
mlp_test_loss, mlp_test_acc = model_mlp.evaluate(test_x, test_y, verbose=0)
print('\nTest accuracy:', np.round(mlp_test_acc,3))

#Train accuracy: 0.916
#Validation accuracy: 0.89
#Test accuracy: 0.882

The test accuracy is approximately 90%. Also, there is a 2% difference between train and validation/test accuracies.

Convolutional Neural networks (CNNs)

Another type of Neural Networks is the Convolutional Neural Network (or CNN). CNNs are better suited for image classification. They use filters (also called kernels or feature maps) which help the model capture and learn various characteristics of an image. A generic architecture of a CNN is shown in Figure 5.

These filters are not static: They are trainable, which means the model learns them during fitting, in a way that optimizes the training objective. This is in contrast to traditional computer vision, which uses static filters for feature extraction.

Also, the depth of a CNN is of paramount importance. This is because an image can be perceived as a hierarchical structure, so several layers of processing make intuitive sense for this domain. The first layers of the CNN focus on extracting low-level features (e.g. edges, corners). As the depth increases, the feature maps learn more complex characteristics, such as shapes and faces.

Moreover, in each step, information undergoes “subsampling”, before being passed to the next filtering layer. The final component is a fully connected layer, which looks like an MLP, but without a hidden layer.

Figure 5: Top-level architecture of a CNN [3]

Let’s dive into the implemenation.

First, it is critical to understand what type of input each network accepts. For MLPs, each image was flattened into a single 28x28 vector. Here, each image is represented as a 3d cube, with dimensions 28x28x1, which represent the format (width, height, color channels). If our images were not grayscale, the dimensions would be 28x28x3.

We will still use the same activation and loss functions as before. Also, we perform only one set of filtering/feature maps and subsampling. In the Keras API, these are called Conv2D and MaxPooling2D layers respectively:

model_cnn = Sequential()
model_cnn.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model_cnn.add(MaxPooling2D((2, 2)))
model_cnn.add(Dense(100, activation='relu'))
model_cnn.add(Dense(10, activation='softmax'))
model_cnn.compile(optimizer="adam", loss='categorical_crossentropy', metrics=['accuracy'])
history_cnn=, train_y, epochs=100, batch_size=32, validation_data=(val_x, val_y), callbacks=callback)

And the output:

Layer (type) Output Shape Param #
conv2d_3 (Conv2D) (None, 26, 26, 32) 320

max_pooling2d_3 (MaxPooling (None, 13, 13, 32) 0

flatten_3 (Flatten) (None, 5408) 0

dense_6 (Dense) (None, 100) 540900

dense_7 (Dense) (None, 10) 1010

Total params: 542,230
Trainable params: 542,230
Non-trainable params: 0
Epoch 1/100
1782/1782 [==============================] - 19s 5ms/step - loss: 0.4063 - accuracy: 0.8581 - val_loss: 0.3240 - val_accuracy: 0.8913
Epoch 2/100
1782/1782 [==============================] - 9s 5ms/step - loss: 0.2781 - accuracy: 0.9001 - val_loss: 0.3096 - val_accuracy: 0.8883
Epoch 3/100
1782/1782 [==============================] - 9s 5ms/step - loss: 0.2343 - accuracy: 0.9138 - val_loss: 0.2621 - val_accuracy: 0.9057
Epoch 4/100
1782/1782 [==============================] - 9s 5ms/step - loss: 0.2025 - accuracy: 0.9259 - val_loss: 0.2497 - val_accuracy: 0.9080
Epoch 5/100
1782/1782 [==============================] - 9s 5ms/step - loss: 0.1763 - accuracy: 0.9349 - val_loss: 0.2252 - val_accuracy: 0.9200
Epoch 6/100
1782/1782 [==============================] - 9s 5ms/step - loss: 0.1533 - accuracy: 0.9437 - val_loss: 0.2303 - val_accuracy: 0.9250
Epoch 7/100
1782/1782 [==============================] - 9s 5ms/step - loss: 0.1308 - accuracy: 0.9516 - val_loss: 0.2447 - val_accuracy: 0.9140
Epoch 8/100
1782/1782 [==============================] - 9s 5ms/step - loss: 0.1152 - accuracy: 0.9573 - val_loss: 0.2504 - val_accuracy: 0.9213
Epoch 9/100
1782/1782 [==============================] - 9s 5ms/step - loss: 0.0968 - accuracy: 0.9644 - val_loss: 0.2930 - val_accuracy: 0.9133
Epoch 10/100
1779/1782 [============================>.] - ETA: 0s - loss: 0.0849 - accuracy: 0.9686Restoring model weights from the end of the best epoch: 5.
1782/1782 [==============================] - 9s 5ms/step - loss: 0.0849 - accuracy: 0.9686 - val_loss: 0.2866 - val_accuracy: 0.9187
Epoch 10: early stopping

Again, we initialised our model with 100 epochs. However, 10 epochs were enough to train — until the EarlyStopping kicked in.

Let’s plot our training and validation curves. We use the same plot_history function method as before:

plot_history(hs={'CNN': history_cnn},epochs=10,metric='loss')
plot_history(hs={'CNN': history_cnn},epochs=10,metric='accuracy')
Figure 6: Training and validation losses of CNN
Figure 7: Training and validation accuracies of CNN

The validation curves follow the same pattern as the MLP model.

Finally, we calculate the train, validation, and test accuracies:

cnn_train_loss, cnn_train_acc = model_cnn.evaluate(train_x,  train_y, verbose=2)
print('\nTrain accuracy:', cnn_train_acc)
cnn_val_loss, cnn_val_acc = model_cnn.evaluate(val_x, val_y, verbose=2)
print('\nValidation accuracy:', cnn_val_acc)
cnn_test_loss, cnn_test_acc = model_cnn.evaluate(test_x, test_y, verbose=2)
print('\nTest accuracy:', cnn_test_acc)
#Output:#Train accuracy: 0.938
#Validation accuracy: 0.91
#Test accuracy: 0.908

The CNN model outperformed the MLP model. This was expected, because CNNs are better suited for image classification.

Closing Remarks

  • The optimization function that the neural networks use for training is ‘stochastic’. This means, among other things, that each time you train a model, you will get slightly different results.
  • The dataset was straightforward. For starters, the images were grayscale, meaning they had a single channel. The colored images have 3 channels (RGB).
  • Even though we used a validation set, we didn’t perform any hyperparameter tuning. In the next part of this tutorial, we will show how to further optimize our models.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: