Handwritten Optical Character Recognition Calculator using CNN and Deep Learning



Original Source Here

Description of the Project

The project aims at segmenting and recognising handwritten digits and mathematical operators in an image. Finally, creating a pipeline for calculating the value of the expression written. The current implementation recognises only four basic mathematical operators namely Add(+), Subtract(-), Multiply(x) and Divide(/). The CNN model contains around 160k trainable parameters, making it easily deployable on less computation efficient devices.

The Dataset

The dataset is taken from Kaggle from this link except for images of the division sign. The images for the division are taken from this Kaggle Link.

The images of the dataset can be visualised from the following collage —

The data distribution can be seen in the following bar plot —

Preprocessing Step

The preprocessing step includes the following sub-steps —

  • Convert three-channel images to Grayscale images.
  • Apply a threshold to all the images to convert the images to binary.
  • Resize the thresholded images to a uniform size of (32x32x1)
  • Encode the non-categorical labels like ‘add’, ‘sub’ to categorical labels.
  • Split the dataset into train and test set in 80–20 ratio.

The implementation of the steps mentioned above is as follows —

In line 6, OpenCV inbuilt function for thresholding is used. Line 12 contains the implementation of encoding the non-categorical labels using LabelEncoder class of sklearn. Finally, in line 15, the dataset is split into train and test sets.

The preprocessing step also includes converting the labels to one-hot vectors and normalising the images. The implementation is as follows —

Building the CNN Model

The CNN model has the following characteristics —

  • Three Convolutional layers with 32, 32, and 64 number of filters, respectively.
  • A MaxPool2D layer follows each Convolutional layer.
  • Three Fully Connected layers follow the convolutional layers for classification.

The Keras implementation is as follows —

The L2 regulariser is used to avoid overfitting.

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1 (Conv2D) (None, 32, 32, 32) 320
_________________________________________________________________
act1 (Activation) (None, 32, 32, 32) 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, 16, 32) 0
_________________________________________________________________
conv2 (Conv2D) (None, 16, 16, 32) 9248
_________________________________________________________________
act2 (Activation) (None, 16, 16, 32) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 8, 8, 32) 0
_________________________________________________________________
conv3 (Conv2D) (None, 8, 8, 64) 18496
_________________________________________________________________
act3 (Activation) (None, 8, 8, 64) 0
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 64) 0
_________________________________________________________________
flatten (Flatten) (None, 1024) 0
_________________________________________________________________
dropout (Dropout) (None, 1024) 0
_________________________________________________________________
fc1 (Dense) (None, 120) 123000
_________________________________________________________________
fc2 (Dense) (None, 84) 10164
_________________________________________________________________
fc3 (Dense) (None, 14) 1190
=================================================================
Total params: 162,418
Trainable params: 162,418
Non-trainable params: 0
_________________________________________________________________

Training the Model

Step Decay is used to decrease the value of the learning rate after every ten epochs. The initial learning rate is kept at 0.001. ImageDataGenerator class of Keras is used for data augmentation to provide a different image each time to the model. The batch size is saved at 128, and the model is trained for 100 epochs.

Performance of the Model

The performance metrics used are as follows —

  • Loss and Accuracy vs Epochs plot
  • Classification report
  • Confusion Matrix

Loss and Accuracy vs Epochs plot —

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: