K-nearest neighbour, a sort-of simple explanation with Python



Original Source Here

K-nearest neighbour, a sort-of simple explanation with Python

The K-nearest neighbour algorithm is an amazing piece of mathematics with many use-cases all over programming! You have probably already encountered a KNN algorithm in the wild too. Rating systems heavily rely on these algorithms to suggest you items that might be of interest.
Today we are going to have a look at how you can make your very own KNN rating system with Python!

What does it do?

A great question to start with, how does it work and what do we use it for?
Have a look at the following graph.

A graph demonstrating KNN

As you can see we have two groups of points, the red dots and green dots. Let’s say that we add a new spot to our data, of course we want to classify our new dot as either green or red. How we achieve this in an automated way? With a KNN of course!
We are literally going to look which items are closest and in which category our new item is!

Want to learn more about the math and theories behind it? Check it out on Wikipedia!

This sounds pretty abstract but let’s turn our graph into some practical data!
Imagine this graph as products in my store, every dot is a rating you have given. The red dots are products that you have given a negative rating, the green ones are positive ratings As the owner of the store I want to show you products which you are likely to buy and for this we are going to use our new algorithm.

Building the example

Installing and importing

Let’s start building our fictional store recommendation system in Python!

We are going to use a library and it may surprise you, it is Surprise. Terrible joke, I know…
We will also be using pandas to create our dataset. Install both with:

pip install surprise pandas

After you have done that, create a new main.py file and let’s get coding!
In our example we will be using fictional data but don’t worry, you can change it to the data you need.

Make sure you import your dependencies, we will be using these specific parts:

  • Surprise Dataset, used to manage our dataset
  • Reader, a class used to read the ratings
  • KNNWithMeans, the algorithm we will be using
import pandas as pdfrom surprise import Dataset, Reader, KNNWithMeans

Creating our ratings

As I mentioned earlier, we will be using mock-up data.
Using Pandas and Surprise it is really easy to shape our data, all we need to do is create a dictionary with 3 keys. Our items, our users and their ratings. Think of your dictionary as a table!

| Item   | User     | Rating |
|--------|----------|--------|
| Book 1 | Dries | 2.5 |
| Book 2 | Dries | 4 |
| Book 1 | Alistair | 3 |
| Book 2 | Alistair | 4 |
| Book 1 | You | 2 |

If your table of ratings looks like this then our accompanying dictionary will be this:

An example ratings dictionary

Keep in mind that if you want reliable recommendations, big data is key. The more data you give to work with, the better the result will be!

Creating and training

Now that we have our data in place, we need to create a dataset and train it. Luckily this is really easy using Pandas and Surprise!

The steps we need to do:

  1. Create a dataframe
  2. Create a reader
  3. Load the dataset
  4. Build the set
  5. Initiate our algorithm
  6. Train our algorithm

Sounds complicated right? Have a look at this screenshot though. Thanks to advanced libraries we don’t even need to do any of the math 😉

Let’s get recommended!

Once you have done all the setup, you are ready to make recommendations. Thanks to Surprise, you can create a new recommendation with a single line!

prediction = algo.predict('user', 'item')

Calling the predict function is everything you need to do. In our example, the prediction will be a float ranging from 1 to 5. This is because we specified our scale when creating the reader.

Do you want to revisit what we created today? The repository can be found on my Github!

Good questions

Can I use a KNN with a binary rating?
First of all, a binary rating system uses 0 and 1. These can be ratings given by users or can be page visits. Something along the lines of “Did you like this article?” with a thumbsup and thumbsdown is a binary rating.

Yes, you can totally use a KNN for binary ratings!

Can I use a KNN with a negative ratings?
This is possible as well, all you need to do is adjust your reader to match your scale!

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: