Original Source Here
Python Implementation (Real-World Dataset)
Let’s implement kNN on a real-world dataset as well. We will be using the
iris dataset and also, we will be ignoring the third feature (sepal height) as we can plot only 2D data for now (this will help in visualizing)
We start by doing what we did above except the data is loaded and features and labels are separated. We then fit our kNN model to the data.
Now we use the following piece of code to build a decision boundary:
This will result in the following plot:
You can see clearly see the boundary created by our classifier and the actual data values. The classifier isn’t able to achieve 100% accuracy which is expected. Let’s calculate the accuracy score:
from sklearn.metrics import accuracy_scoreprint(accuracy_score(clf.predict(X), y))
This gives us an accuracy of around 80%. Let’s see if we can do better
Finding Optimal Value of k
There can be 100 different values of k. How are we sure that the value 15 resulted in the best score above? We are not. Luckily, we can easily check for this. We can check multiple values of k and plot them to see which gives us the best accuracy
Following is the code to do that:
We test our classifier for each odd value between 1 and 30 and we get the following graph of accuracies for each value of k.
The optimal value of k, in this case, is 1 and 15 (which we tried initially) is one of the values which results in the least score.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot