Original Source Here

# Description of the Project

The project aims at segmenting and recognising handwritten digits and mathematical operators in an image. Finally, creating a pipeline for calculating the value of the expression written. The current implementation recognises only four basic mathematical operators namely Add(+), Subtract(-), Multiply(x) and Divide(/). The CNN model contains around 160k trainable parameters, making it easily deployable on less computation efficient devices.

# The Dataset

The dataset is taken from Kaggle from this link except for images of the division sign. The images for the division are taken from this Kaggle Link.

The images of the dataset can be visualised from the following collage —

The data distribution can be seen in the following bar plot —

# Preprocessing Step

The preprocessing step includes the following sub-steps —

- Convert three-channel images to Grayscale images.
- Apply a threshold to all the images to convert the images to binary.
- Resize the thresholded images to a uniform size of (32x32x1)
- Encode the non-categorical labels like ‘add’, ‘sub’ to categorical labels.
- Split the dataset into train and test set in 80–20 ratio.

The implementation of the steps mentioned above is as follows —

In **line 6**, OpenCV inbuilt function for thresholding is used. **Line 12 **contains the implementation of encoding the non-categorical labels using LabelEncoder class of sklearn. Finally, in **line 15**, the dataset is split into train and test sets.

The preprocessing step also includes converting the labels to one-hot vectors and normalising the images. The implementation is as follows —

# Building the CNN Model

The CNN model has the following characteristics —

- Three Convolutional layers with 32, 32, and 64 number of filters, respectively.
- A MaxPool2D layer follows each Convolutional layer.
- Three Fully Connected layers follow the convolutional layers for classification.

The Keras implementation is as follows —

The L2 regulariser is used to avoid overfitting.

`Model: "sequential"`

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv1 (Conv2D) (None, 32, 32, 32) 320

_________________________________________________________________

act1 (Activation) (None, 32, 32, 32) 0

_________________________________________________________________

max_pooling2d (MaxPooling2D) (None, 16, 16, 32) 0

_________________________________________________________________

conv2 (Conv2D) (None, 16, 16, 32) 9248

_________________________________________________________________

act2 (Activation) (None, 16, 16, 32) 0

_________________________________________________________________

max_pooling2d_1 (MaxPooling2 (None, 8, 8, 32) 0

_________________________________________________________________

conv3 (Conv2D) (None, 8, 8, 64) 18496

_________________________________________________________________

act3 (Activation) (None, 8, 8, 64) 0

_________________________________________________________________

max_pooling2d_2 (MaxPooling2 (None, 4, 4, 64) 0

_________________________________________________________________

flatten (Flatten) (None, 1024) 0

_________________________________________________________________

dropout (Dropout) (None, 1024) 0

_________________________________________________________________

fc1 (Dense) (None, 120) 123000

_________________________________________________________________

fc2 (Dense) (None, 84) 10164

_________________________________________________________________

fc3 (Dense) (None, 14) 1190

=================================================================

Total params: 162,418

Trainable params: 162,418

Non-trainable params: 0

_________________________________________________________________

# Training the Model

Step Decay is used to decrease the value of the learning rate after every ten epochs. The initial learning rate is kept at 0.001. ImageDataGenerator class of Keras is used for data augmentation to provide a different image each time to the model. The batch size is saved at 128, and the model is trained for 100 epochs.

# Performance of the Model

The performance metrics used are as follows —

- Loss and Accuracy vs Epochs plot
- Classification report
- Confusion Matrix

Loss and Accuracy vs Epochs plot —

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot