# Introduction to Deep Learning: The Perceptron Part 2

The Rosenblatt’s perceptron

Frank Rosenblatt a Psychologist and Logician, used the concept of Hebbian Learning to come with his own model of the Perceptron. It was similar to the McCulloch–Pitts model but provided a learning algorithm known as the perceptron training rule (P.T.A.).

Its main difference from the McCulloch–Pitts model is:-

1. Inputs are no longer limited to boolean values.
2. Weights and the threshold are variable and can be learned.
3. No inhibitory synapse.
4. The threshold is not the only “activation ” output.

For binary classification (i.e. yᵢ ∈{−1,+1}), if the given data is linearly separable, the perceptron finds a hyperplane W that divides the 2 classes linearly, with minimum classification error.

Explanation

Given a set of data S = {(x⁽¹⁾,y⁽¹⁾),(x⁽²⁾,y⁽²⁾),…,(x⁽ⁿ⁾,y⁽ⁿ⁾)}, where point (x⁽ⁱ⁾,y⁽ⁱ⁾) i.i.d. drawn from an unknown probability distribution D and are linearly separable where y⁽ⁱ⁾ ∈{−1,+1}, and X = [x⁽¹⁾,x⁽²⁾…….x⁽ⁿ⁾] are any real number values.
Let us randomly initialize the hyperplane W = [w⁽¹⁾,w⁽²⁾…….w⁽ⁿ⁾], and the bias term b. A neuron activates if the weighted input exceeds a threshold.
Our task is to find the ideal values (W, b) to classify unknown values of X correctly. Here classified correctly’ means that x⁽ⁱ⁾ is on the correct side of the hyperplane. Absorb b in to W such that then W= [w⁽¹⁾,w⁽²⁾…….w⁽ⁿ⁾,b] and X = [x⁽¹⁾,x⁽²⁾…….x⁽ⁿ⁾,1]. Let us take our hypothesis h: X — >Y such that:

This means if y’⁽ⁱ⁾(wᵀx⁽ⁱ⁾) is greater than zero, we have classified the point (x⁽ⁱ⁾,y⁽ⁱ⁾) correctly. But what if we do not classify them correctly? That’s where the concept of learning comes in. If our initial weights are wrong, we update the weights based on the error.

We repeat this till we get no error. The algorithm is given below

Perceptron Convergence Theorem

We can see that the algorithm will go on till it finds a hyperplane that gives 0 error. But, what if, no such hyperplane exists?

Perceptron Convergence Theorem states that the PTA will make at most (1/λ²) mistakes before finding the separating hyperplane, where λ is the distance from this hyperplane (blue) to the closest data point.

But for data that is linearly inseparable(i.e. no hyperplane exists), a single neuron is not enough (Example XOR gate). In that case, we need to combine multiple perceptrons to get the desired result. It is known as a Multilayer Perceptron (MLP) or the Artificial Neural Network(ANN). MLP can be used to compute any arbitrary Boolean function. We will get to learn more about it in part 3.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot