Baby steps 2: Supervised learning Pipeline

Original Source Here

Baby steps 2: Supervised learning Pipeline

Supervised learning is a type of ML that you will frequently hear about. Many of the commonly used algorithms are under the supervised learning umbrella. Just to refresh your memory let’s revise what is supervised learning. So as the name implies there will be supervision, how so? well, the algorithm will learn from the mistakes it makes by comparing its prediction to the right output that was given to it, to produce a better prediction.

Supervised learning can be divided into two kinds of problems:

Regression: where we predict real number values.

Classification: where we predict a specific label.

Some of the existing Supervised learning algorithms include:

– Linear regression

– Logistic regression

– Neural network (NN)

– Support vector machine (SVM)

So just to establish a clear foundation about supervised learning I will mention what those algorithms usually have in common and how they differ, this article will cover what they have in common. We usually refer to the algorithm you are implementing as your model so I will call it that during the rest of this article.

These models will mainly have the following phases, we can have more phases and sometimes it is better to have more but I will only talk about the main ones so as not to confuse you with too much information.

Let’s build a pipeline:

1- Prepare the data: preparing the data is a vital stage that can significantly change the model’s performance. We will get our labeled data with the input and output and perform some preprocessing, the preprocessing you will do depends on your application. Then we will need to split the data into sets. In best practices, we use a training set for the model to learn on then we use a validation set for parameters configuration, but don’t worry child I will explain which parameters need configuration, and lastly, we will use testing set to evaluate the model performance. In reality, you will have many steps to preparing the data but let’s leave that for another article.

Phase 1

2- Train your model: the data you have prepared will act as a guide for your model. It will start by predicting something so random based on just the input to see if it will get it right, and then it will compare its prediction with the actual target output. How about we take an example, you are practicing for an upcoming test so to see where you stand and how well you understand the material you will start solving questions right away. You will solve these questions just based on what you think you know and use the information given to you in the question, then compare your answer with the right answer. When you compare your answers, you realize that your answers differ from the right answer, hence you go learn what you did wrong to avoid getting it wrong on the next try. Your model will do something similar, and it will use the input to try to predict the output it will then backpropagate its error. If you are not familiar with backpropagation, I have prepared a small paragraph at the end of this article.

Phase 2

3- Evaluate your model performance: before using the model, you need to test your model with a different set of data that it didn’t see yet, if you are wondering why let’s look at another example. Let us use the preparing for the exam example again, so after you have practiced the book’s exercises you’d want to practice some exam papers that have a new set of values or maybe different vocabs and why is that? it’s because you don’t want to freak out if you just get a different question form and you want to test your performance. when you only practice the same set of questions you might not understand different questions that are asking for the same thing but have a different form, you are restricting yourself on a small set of practice questions. In machine learning we call this scenario overfitting; this topic too will be discussed in another article. After evaluating your model performance on the new data, you will see if this is the performance that you want, or you’d like to improve it.

Whole Pipeline


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: