A Closer Look into the Perceptron Model in Deep Learning



Original Source Here

5. Implementing our model in Python

We’ve now built up our theoretical foundation and are prepared to start working towards bringing these concepts to life. Let us move on to Python and implement our Perceptron model with the Step Increments learning algorithm.

For those of you who are just starting out on your journey and aren’t already aware, it is common to build and work on many Data Science and ML projects using Google Colaboratory. It’s a product provided by Google Research that allows users to write and execute code on their browser, without any prior setup. Colab allows users to connect to Google’s compute engine backend and utilise its resources, so you can freely work on complex projects without having to worry about any system requirements. A web-based interactive environment, known as a Jupyter Notebook, is used for building code and working with data. Everything gets stored on your drive, so its always ready to be shared with your peers. You can get started with Colab here.

In Python, we shall be implementing the Perceptron model we have seen above, to solve a classification problem on a Breast Cancer Database, imported from the UC Irvine Machine Learning Repository. The data has been originally collected from the University of Wisconsin Hospitals, Madison.

The nature of the dataset is multivariate, i.e. there are several input parameters which will be used for binary classification into two categories, benign (not detected) or malignant (detected).

Let us train our model to predict accurately on this data.

We’ll first import the required libraries for this task.

import sklearn.datasetsimport numpy as npimport seaborn as snsimport pandas as pdfrom sklearn.model_selection import train_test_split

Next, we’ll load this data from sklearn.datasets. Two variables, X and Y (NumPy arrays) will be used to store the input values and their corresponding ground truths (true classification values).

breast_cancer = sklearn.datasets.load_breast_cancer()X = breast_cancer.dataY = breast_cancer.target

We can use pandas.Dataframe to visualize our data in a tabular fashion. We can append the target values to our table under a column named class, and then view the first 5 rows of our table.

data = pd.DataFrame(breast_cancer.data, columns = breast_cancer.feature_names)data['class'] = breast_cancer.targetdata.head()
Table by author

We don’t want to train on our entire data, rather we’ll leave out some points which our model will see only once it has completed its training. This is known as the Evaluation data and will be serve as a test for our model, to see how it will actually predict given new samples.

We can use train_test_split from sklearn to help us with splitting our data.

X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.05, stratify = Y, random_state = 1)

We pass on our X (inputs) and labels to the function. In addition, we specify a few parameters.

  • test_size will determine the size of the test data in comparison with our whole dataset.
  • stratify = Y ensures that the distribution of Y remains equal among the training and test datasets. We don’t want to train on all Y’s equal to 1 and then pass on a sample for which the right output would be 0.
  • random_state = 1 ensures the reproducibility of the results. The same division with identical samples will be returned each time the function is called.

Let us print out the sizes of all the data sets we have created.

print(X_train.shape, Y_train.shape, X_test.shape, Y_test.shape)Output: 
(540, 30) (540,) (29, 30) (29,)

Our training data has 540 different samples with 30 different input parameters. Correspondingly, there are 540 different true labels for each sample in X_train. The size of the test data set is 29 instead of 540.

Alright, so now that we’ve processed and prepared our data for training, let us build a class for our Perceptron model, which taking the training data as inputs would implement the entire pipeline we have discussed above.

A side note on writing classes here. In deep learning, it is good practice to integrate the entire structure of a network into a single class, which we can just instantiate and call on our training data. Although it requires more effort from a programmer’s end, bundling the entire functionality of the network into a single object makes it easy to instantiate, run a training loop and predict on any given data. Besides, the process of debugging is also simplified compared to when we would have isolated functions lying around. This process of writing neat and structured code pays off when we begin to work with popular libraries such as PyTorch, wherein we can easily just call an entire pre-trained network with a single line of code, as well as build our own, inheriting from a parent class.

Here’s our Perceptron class with the learning algorithm implementation.

Our __init__ function initialises two parameters,self.w and self.b to None. These will be trained to predict on our dataset. The model function returns the prediction using self.w and self.b, for each sample having n parameters. The predict function helps us call the model function for each sample in the entire data and stores the outcome in a list which will be returned.

The train function is where we shall implement our algorithm. First we need to ensure the size of self.w is same as the number of parameters or attributes which we need for prediction (vectors must be of the same size). We initialise two more lists, one to store all the accuracy values we get, and the other to store all the different weights we shall be getting after updating. For the number of epochs we have specified, we’ll pass through each sample (x) and its corresponding label and make our update. After we’ve passed over the data once, we will calculate the accuracy using accuracy_score from sklearn, and append it to our accuracy list. Weights at this point are also stored in the weight matrix.

A running count of the maximum accuracy is maintained and whenever it is exceeded, the weights and accuracy for that update are stored. After all epochs are completed, the best values are assigned to the weights. The accuracy list can also be plotted for further insights.

To train on the data we have, let us change its type into a NumPy array. We can then instantiate the class and call the train function on our data. After our training has completed, we will predict on the test data and calculate the accuracy.

X_train_np = X_train.to_numpy()X_test_np = X_test.to_numpy()percep_model = Perceptron()percep_model.train(X_train_np,Y_train,epochs = 200,lr =0.5)y_prediction = percep_model.predict(X_test_np)print('Accuracy on test data is',accuracy_score(y_prediction,Y_test))

Executing this code, we get the following output and plot.

Graph By Author
Output:
max accuracy occurs at epoch value equal to 185 and is equal to 0.9203703703703704
Accuracy on test data is 0.896551724137931

With this we come to the end of our implementation of the Perceptron model. We have seen how to import an external dataset and visualize it using the basics of Pandas. After creating training and testing datasets, we wrote down a class to implement the entire training pipeline. In future articles, we’ll introduce the Sigmoid Neuron, after discussing a few limitations of the Perceptron model. We’ll also begin with combining several neurons to form an entire Neural Network, and dive deeper into more complex and efficient learning algorithms. Until then, Happy Training!

Citations (Breast Cancer Wisconsin Data set)

  1. O. L. Mangasarian and W. H. Wolberg: “Cancer diagnosis via linear
    programming”, SIAM News, Volume 23, Number 5, September 1990, pp 1 & 18.
  2. William H. Wolberg and O.L. Mangasarian: “Multisurface method of
    pattern separation for medical diagnosis applied to breast cytology”,
    Proceedings of the National Academy of Sciences, U.S.A., Volume 87,
    December 1990, pp 9193–9196.
  3. O. L. Mangasarian, R. Setiono, and W.H. Wolberg: “Pattern recognition
    via linear programming: Theory and application to medical diagnosis”,
    in: “Large-scale numerical optimization”, Thomas F. Coleman and Yuying
    Li, editors, SIAM Publications, Philadelphia 1990, pp 22–30.
  4. K. P. Bennett & O. L. Mangasarian: “Robust linear programming
    discrimination of two linearly inseparable sets”, Optimization Methods
    and Software 1, 1992, 23–34 (Gordon & Breach Science Publishers).

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: