CNN on GPU: MNIST Handwritten Digits

Original Source Here

CNN on GPU: MNIST Handwritten Digits

In this blog, I would like to demonstrate how to train a neural network on GPUs. If you have a laptop or PC with an NVidia graphics card, it is possible to train the neural network presented in the course locally. On the contrary, Kaggle provides 30 hours worth of GPU Accelerator which could be used for training purposes. GPUs are processing units made for graphics, honed for huge matrix multiplication which are required for computer graphics in games. Fortunately, neural networks are huge and repetitive matrix calculation which could be efficiently carried out in GPUs.


  1. Appropriate comments are provided along with the code.
  2. Read the description before the code to understand the working of the code.

References: PyTorch for Deep Learning — Full Course / Tutorial

Original Notebook: thoeer Notebook


For the demonstration purpose, we will be using a popular MNIST handwritten digit dataset.


Let’s start by importing the required libraries. The libraries used are:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# The files are on paths /kaggle/input/digit-recognizer/*.csv
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

# torch libraries
import torch
import torch.nn as nn
import torch.nn.functional as F

from import DataLoader
from import Dataset
from import SubsetRandomSampler

Device to use

The function get_device would provide CPU if GPUs are not available. It would make the code run generically on all GPU and CPU environments.

def get_device():
if torch.cuda.is_available():
return torch.device('cuda')
return torch.device('cpu')

Reading dataset

print(train_dataset.shape, test_dataset.shape)
torch.Size([42000, 785]) torch.Size([28000, 784])

Neural Network

We are going to use Convolutional Neural Network here, check the references for further information. It only has a constructor, forward pass, and accuracy method. The accuracy method is defined as a static function.

class ConvolutionalNNet(nn.Module):
def __init__(self):

self. verbose = False = 0.01 # Learning rate

self.pool = nn.MaxPool1d(2) # bs * 16 * (arr_length/2 392)

self.conv1 = nn.Conv1d(1, 16, kernel_size=3, stride=1, padding=1) # bs * 16 * arr_length 784
self.conv2 = nn.Conv1d(16, 16, kernel_size=3, stride=1, padding=1) # bs * 16 * (arr_length/2)
self.conv3 = nn.Conv1d(16, 16, kernel_size=3, stride=1, padding=1) # bs * 16 * (arr_length/4)
self.conv4 = nn.Conv1d(16, 16, kernel_size=3, stride=1, padding=1) # bs * 16 * (arr_length/8)

self.fc1 = nn.Linear(784, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
x = self.pool(F.relu(self.conv4(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)

return x

def accuracy(y_pred, y_act):
_, preds = torch.max(y_pred, dim=1)
# return f1_score(y_act, preds, average='weighted')
# return accuracy_score(preds, y_act)
return torch.sum(preds==y_act).item()/ len(preds)

Setting optimizer

An optimizer is a function that corrects the weight of the neural network. For the neural network, we can set the optimizer by using the function below. By default, it is stochastic gradient descent.

def func1(self, optimizer_class=None):
optimizer_class = torch.optim.SGD if optimizer_class is None else optimizer_class
self.optmzr = optimizer_class(self.parameters(),

# Add the function to class
ConvolutionalNNet.set_optimizer = func1

Calculating batch loss

We need to calculate the model for the forward pass. The loss is measured using the CrossEntropy function. If the optimizer is passed, the backward pass is performed, or else only metric is calculated. Finally, total loss and metric value are returned. Here, accuracy is being used as a metric, however, F1-score will be much preferred.

def func2(self, loss_fn, xb, yb, opt=None, metric=None):
preds = self(xb)
# calculate loss
loss = loss_fn(preds, yb)

if opt is not None: # no optimization in validation set
loss.backward() # compute gradients
opt.step() # perform optimization
opt.zero_grad() # reset gradients

metric_result = None
if metric is not None:
metric_result = metric(preds, yb)

return loss.item(), len(xb), metric_result

# Add the function to class
ConvolutionalNNet.loss_batch = func2

Evaluate and fit function

def func1(self, loss_fn, metric, test_data_dl):
"""Function to evaluate"""
with torch.no_grad(): # This flag tells not to calculate the gradients
# Pass all the test data through the model
for yb,xb in test_data_dl:
avg_loss, num_ds, avg_metric = self.loss_batch(
loss_fn, xb, yb, metric=metric
return avg_loss, avg_metric

def func2(self, train_cuda_dl, epochs=1, evaluate_dl=None):
"""Function to fit model with the training data"""
time_to_95_set = False
train_loss,train_acc,test_loss,test_acc = [],[],[],[]
loss_fn = F.cross_entropy

iter_cnt = 0
for epoch in range(epochs):
for yb,xb in train_cuda_dl:
tr_loss, n_count, tr_metric = self.loss_batch(loss_fn, xb, yb, self.optmzr, ConvolutionalNNet.accuracy)

if iter_cnt % 10 == 0:
# append to training outputs

# appending the test outputs
if evaluate_dl is not None:
tst_loss, tst_metric = self.evaluate(loss_fn, ConvolutionalNNet.accuracy, evaluate_dl)

if (not time_to_95_set) and (tst_metric >= 0.95):
self.time_to_95 = iter_cnt
time_to_95_set = True


if self.verbose:
print(f"{iter_cnt} Training loss: {tr_loss} , accuracy: {tr_metric} Testing loss: {tst_loss} , accuracy: {tst_metric}")

# increasing counter
iter_cnt += 1
return train_loss,train_acc,test_loss,test_acc

# Add the functions to class
ConvolutionalNNet.evaluate = func1 = func2

Predict function

def func(self, test_dl):
preds_final = None
for xb in test_dl:
y_pred = self(xb)
_, preds = torch.max(y_pred, dim=1)
if preds_final is None:
preds_final = preds
preds_final =, preds))
# print(preds)
# break
return preds_final

# Predict function
ConvolutionalNNet.predict = func

Initialization of the Neural network model. Re-run the block below to reinitialize the weights and bias of the network.

# Creating a network instance
net = ConvolutionalNNet()
# If the device is CUDA, then move the model into the CUDA
if get_device()==torch.device('cuda'):


We can use the data directly, however, we are going to use the Data loader module provided by torch, which has useful tools such as randomly sampling as well as batching data. To use the data loader, the data needs to be in a specific format, so we are going to wrap a class, Dataset, on top of the data.

The MNISTHandWrittenDigitsDataset is the wrapper class that works for both trainings as well as testing data. To flag it as testing data y_lb should be -1.

class MNISTHandWrittenDigitsDataset(Dataset):
"""Class as a custom dataset"""
def __init__(self, ds, x_lb=1, y_lb=0):
"""Constructor for the class
:params ds: data to be loaded in the dataset
:params x_lb: starting index for x or features
:params y_lb: starting index for y or response. -1 for testing data
self.is_train = False if y_lb == -1 else True # Check if it training or test data

d_len, arr_len = self.X.shape

# transform to required datatypes

# Load and transform for testing data
if self.is_train:
self.y=ds[:,y_lb] # labels should be integers

def __len__(self):
return len(self.X)

def __getitem__(self, idx):
"""return y and x if training data, else return x only
:params idx: index to return
if self.is_train:
return self.y[idx], self.X[idx]
return self.X[idx]

Loading Data to GPU

Before computing, the data needs to be moved to GPU. However, it is not good practice to load entire data into GPU. We would create a custom data loader that would move only the required batch to GPU and perform the calculation. The size of the batch is determinable.

class DeviceDataLoader:
"""Class as a data loader"""
def __init__(self, dl, device):

def __iter__(self):
for b in self.dl:
yield self.to_device(b, self.device)

def __len__(self):
return len(self.dl)

def to_device(self, data, device):
if isinstance(data, (tuple,list)):
return [self.to_device(x, device) for x in data]
return, non_blocking=True)

Training and Validation data split

The function would randomly generate indices from the dataset. The training and validation split is 9:1, which can be changed by passing in as val_pct.

def split_indices(n, val_pct=0.1, seed=99):
# Determin the size of validation set
n_val = int(val_pct*n)
# Set random seed
# Create random permutation of 0 to n-1
idxs = np.random.permutation(n)
# Pick first n_val indices for validation set
return idxs[n_val:], idxs[:n_val]

Splitting training dataset

The training data set would be split into training and validation. The classification model would not see the validation set. Thus, It would be used for evaluation purposes.

train_idx, valid_idx = split_indices(len(train_dataset), 0.2)
print(len(train_idx), len(valid_idx))

# Training sampler and data loader
train_sampler = SubsetRandomSampler(train_idx)
train_dl = DataLoader(
batch_size = 100, # Change the batch size here
train_cuda_dl = DeviceDataLoader(

# Validation sampler and data loader
# We want to load the entire validation data at once, not in chunk
valid_sampler = SubsetRandomSampler(valid_idx)
valid_dl = DataLoader(
batch_size = len(valid_idx), # load all the data at once
valid_cuda_dl = DeviceDataLoader(

# Testing data
test_dl = DataLoader(
MNISTHandWrittenDigitsDataset(test_dataset, x_lb=0, y_lb=-1),
batch_size = 500 # load all the data at once
test_cuda_dl = DeviceDataLoader(
33600 8400a,b,c,, 2, valid_cuda_dl)df = pd.DataFrame({'train_loss':a,'train_acc':b,'test_loss':c,'test_acc':d})
df[['train_loss','test_loss']].plot.line(figsize=(12,8), xlabel='Calculated loss', ylabel='iterations', title='Training ~ Validation Loss')
df[['train_acc','test_acc']].plot.line(figsize=(12,8), xlabel='Accuracy', ylabel='iterations', title='Training ~ Validation Accuracy')
<AxesSubplot:title={'center':'Training ~ Validation Accuracy'}, xlabel='Accuracy', ylabel='iterations'>
tensor([2, 0, 9, ..., 3, 9, 2], device='cuda:0')

Metrics monitoring

We are going to monitor various following metrics for different hyperparameters.



  • Iteration Count to Reach 95% accuracy
  • Accuracy
  • Executing Time
df = pd.DataFrame({
'batch_size': [100, 200, 500, 750, 1000,100, 200, 500, 750, 1000],
'exec_time': [133.0568618774414, 131.0257797241211, 131.81717801094055, 130.84673261642456, 131.0255696773529,17.37652015686035, 17.138057947158813, 17.143101453781128, 17.339112520217896, 17.434961795806885],
'accuracy': [0.9544047619047619, 0.9563095238095238, 0.9519047619047619, 0.9458333333333333, 0.958452380952381,0.9546428571428571, 0.9530952380952381, 0.9521428571428572, 0.9539285714285715, 0.9534523809523809],
'iter_count': [540, 490, 570, 450, 470, 580, 560, 500, 640, 510]


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: