Introduction to Deep Learning: The Perceptron Part 2



Original Source Here

Introduction to Deep Learning: The Perceptron Part 2

The Rosenblatt’s perceptron

Frank Rosenblatt a Psychologist and Logician, used the concept of Hebbian Learning to come with his own model of the Perceptron. It was similar to the McCulloch–Pitts model but provided a learning algorithm known as the perceptron training rule (P.T.A.).

Fig 1) The Rosenblatt’s Perceptron. Source http://deeplearning.cs.cmu.edu/F20/document/slides/lec1.intro.pdf

Its main difference from the McCulloch–Pitts model is:-

  1. Inputs are no longer limited to boolean values.
  2. Weights and the threshold are variable and can be learned.
  3. No inhibitory synapse.
  4. The threshold is not the only “activation ” output.

For binary classification (i.e. yᵢ ∈{−1,+1}), if the given data is linearly separable, the perceptron finds a hyperplane W that divides the 2 classes linearly, with minimum classification error.

Fig 2) Hyperpalne between linearly separable data. Source https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote03.html

Explanation

Given a set of data S = {(x⁽¹⁾,y⁽¹⁾),(x⁽²⁾,y⁽²⁾),…,(x⁽ⁿ⁾,y⁽ⁿ⁾)}, where point (x⁽ⁱ⁾,y⁽ⁱ⁾) i.i.d. drawn from an unknown probability distribution D and are linearly separable where y⁽ⁱ⁾ ∈{−1,+1}, and X = [x⁽¹⁾,x⁽²⁾…….x⁽ⁿ⁾] are any real number values.
Let us randomly initialize the hyperplane W = [w⁽¹⁾,w⁽²⁾…….w⁽ⁿ⁾], and the bias term b. A neuron activates if the weighted input exceeds a threshold.
Our task is to find the ideal values (W, b) to classify unknown values of X correctly. Here classified correctly’ means that x⁽ⁱ⁾ is on the correct side of the hyperplane. Absorb b in to W such that then W= [w⁽¹⁾,w⁽²⁾…….w⁽ⁿ⁾,b] and X = [x⁽¹⁾,x⁽²⁾…….x⁽ⁿ⁾,1]. Let us take our hypothesis h: X — >Y such that:

This means if y’⁽ⁱ⁾(wᵀx⁽ⁱ⁾) is greater than zero, we have classified the point (x⁽ⁱ⁾,y⁽ⁱ⁾) correctly. But what if we do not classify them correctly? That’s where the concept of learning comes in. If our initial weights are wrong, we update the weights based on the error.

We repeat this till we get no error. The algorithm is given below

https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote03.html https://github.com/CaptainPramil/Geeks-Of-the-round-table/blob/master/Gates_using_perceptron.ipynb
Fig 3) Basic Gates using The Perceptron

Perceptron Convergence Theorem

We can see that the algorithm will go on till it finds a hyperplane that gives 0 error. But, what if, no such hyperplane exists?

Perceptron Convergence Theorem states that the PTA will make at most (1/λ²) mistakes before finding the separating hyperplane, where λ is the distance from this hyperplane (blue) to the closest data point.

For proof refer: https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote03.html

But for data that is linearly inseparable(i.e. no hyperplane exists), a single neuron is not enough (Example XOR gate). In that case, we need to combine multiple perceptrons to get the desired result. It is known as a Multilayer Perceptron (MLP) or the Artificial Neural Network(ANN). MLP can be used to compute any arbitrary Boolean function. We will get to learn more about it in part 3.

Fig 4) Multilayer Perceptron

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: