Original Source Here
Both classification and regression deal with the problem of mapping a function from input to output. However, when it comes to classification, the output is a discrete class label or categorical output. While on the other hand, when the problem is a regression problem, the output is continuous.
We must not forget that both these problems fall under the category of Supervised Learning.
Supervised learning is where we have input variables (X) and an output variable (Y) and we use machine learning algorithm to learn the mapping function from the input to the output.
We know that ML algorithms learn the mapping function from the input set to the output set. In regression problems, the mapping function that algorithms want to learn is continuous.
Regression is a type of problem that requires the use of machine learning algorithms that learn to predict the continuous variables.
To measure the learned mapping function’s performance, we measure the prediction’s closeness with the true labeled validation/test data. In the figure below, blue is the regression model’s predicted values, and red is the true labeled function. The blue line’s closeness with the red line will give us a measure: How good is our model?
While building the model, we define our cost function, which measures the value of the learned values’ deviation from the predicted values. Optimizers make sure that this error reduces over the progressive epochs.
Some of the most common error functions (or cost functions ) used for regression problems are :
- Mean Squared Error ( MSE )
- Root Mean Squared Deviation/Error ( RMSD/RMSE )
- Mean Absolute Error ( MAE )
Note: Xi is the predicted value, X̂i is the true value, and N is the total samples over which prediction is made.
Examples of regression problems could include:
- Predicting the price of houses based on data such as the quality of schools in the area, the number of bedrooms in the house, and the house’s location.
- Predicting the sales revenue of a company based on data such as the previous sales of the company.
- Predicting the temperature of any day based on data such as wind speed, humidity, atmospheric pressure.
Algorithms for Regression:
- Linear Regression
- Support Vector Regression
- Regression Tree
In regression problems, the mapping function that algorithms want to learn is discrete. The objective is to find the decision boundary/boundaries, dividing the dataset into different categories.
Classification is a type of problem that requires the use of machine learning algorithms that learn how to assign a class label to examples from the problem domain.
For example, suppose there are three class labels, [ Apple, Banana, Cherry]. But machines don’t have the sense to understand these labels. That’s why we need to convert these labels into a machine-readable form.
For the above example, we can define
Apple = [1,0,0], Banana = [0,1,0], Cherry = [0,0,1]
Once the machine learns from these labeled training datasets, it will give probabilities of different classes on the test dataset like this :
[P(Apple), P(Banana), P(Cherry)]
These predicted probabilities can be from one type of probability distribution function (PDF), and the actual (true) labeled dataset can be from another probability distribution function (PDF). If the predicted distribution function tends to follow the actual distribution function, we say that model is learning accurately.
##Predicted Probability is given by a softmax layer — Prabal
Some of the common cost functions for the classification problems would be :
Suppose there are M class labels, and the predicted distribution for the i-th data sample is :
P(Y) = [Yi1′, Yi2′, ………. , YiM’]
And, actual distribution for that sample would be,
A(Y) = [Yi1, Yi2, ……….., YiM]
Cross Entropy ( CEi) = — (Yi1*log(Yi1′) + Yi2*log(Yi2′) + …… + YiM*log(YiM’))
This is a special case of categorical cross-entropy, where there is only one output that can two values, either 0 or 1. For example, if we want to predict whether a cat is present in any image or not.
Here, the cross-entropy function varies with the true value of Y,
CEi = -Yi1*log(Yi1′) , if Yi1 = 1 #Use Image — Prabal
CEi = -(1-Yi1)*log(1-Yi1′), if Yi1 = 0 #Use Image — Prabal
And similarly, Binary-Cross-Entropy would be averaged over all the datasets.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot