Supervised Learning — Naive Bayes Algorithm

Original Source Here

Supervised Learning — Naive Bayes Algorithm

This article explains the Naive Bayes which is one of the machine learning algorithms.

Structure of Naive Bayes is similar to linear models but gives faster results compared to linear models. However, the price of the being fast is that the generalization performance is worse than linear models. The reason why Naive Bayes algorithm is efficient is based on idea behind of it. Naive Bayes relies on collecting simple statistical knowledge by looking at the parameters individually for each feature. First of all, let’s explain the theoretical knowledge and then continue with what kind of statistical data is collecting.

Bayes Theorem

Bayes’ theorem is the relationship between the conditional probabilities and marginal probabilities for one random event X and another random event Y (if there is extant probability for Y) occurring during a stochastic process. The mathematical model of this definition is as follows:

Figure 1. dataset

Now, let’s implement this theoretical knowledge with an example. Dataset consists of 11 point which are 6 of them represent software developer and 5 of them represent Civil Engineer. Structure of the data involves the profession in terms of salary and job experience(year). Our goal is to predict the profession -whether software developer or civil engineer- by using the unique salary and unique job experience data. As seen in figure 1, red circle represents Software Developer(SD), blue circle represents Civil Engineer(CE) and the goal of the project is to predict purple point belongs to which classes. Let’s create an equation for this problem based on mathematical/statistical model.

SD= Software Developer,

CE=Civil Engineer

X= purple point

The probability of Software Developer in given purple point is:


P(SD)=SD/all data = 6/11

P(X) =number of similarity of observation, how many data point points are choosing? (see figure 2) This can also be tuned by the user. Let’s choose 4 data points in the circle as seen in Figure 2. So the result = 4/11

P(X/SD) = ignore CE data so 6 data remain for SD and there are 3 red points in the similarity range(circle). So the result= 3/6

Figure 2. dataset with similarity range

According to dataset which includes 11 data the probability that the purple point is a Software Developer(SD) is 75%. The same calculations can be made for Civil Engineer(CE) and the result would be 25%.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: