# K-nearest neighbour, a sort-of simple explanation with Python

Original Source Here

# K-nearest neighbour, a sort-of simple explanation with Python

The K-nearest neighbour algorithm is an amazing piece of mathematics with many use-cases all over programming! You have probably already encountered a KNN algorithm in the wild too. Rating systems heavily rely on these algorithms to suggest you items that might be of interest.
Today we are going to have a look at how you can make your very own KNN rating system with Python!

# What does it do?

A great question to start with, how does it work and what do we use it for?
Have a look at the following graph.

As you can see we have two groups of points, the red dots and green dots. Let’s say that we add a new spot to our data, of course we want to classify our new dot as either green or red. How we achieve this in an automated way? With a KNN of course!
We are literally going to look which items are closest and in which category our new item is!

This sounds pretty abstract but let’s turn our graph into some practical data!
Imagine this graph as products in my store, every dot is a rating you have given. The red dots are products that you have given a negative rating, the green ones are positive ratings As the owner of the store I want to show you products which you are likely to buy and for this we are going to use our new algorithm.

# Building the example

## Installing and importing

Let’s start building our fictional store recommendation system in Python!

We are going to use a library and it may surprise you, it is Surprise. Terrible joke, I know…
We will also be using pandas to create our dataset. Install both with:

`pip install surprise pandas`

After you have done that, create a new `main.py` file and let’s get coding!
In our example we will be using fictional data but don’t worry, you can change it to the data you need.

Make sure you import your dependencies, we will be using these specific parts:

• Surprise Dataset, used to manage our dataset
• KNNWithMeans, the algorithm we will be using
`import pandas as pdfrom surprise import Dataset, Reader, KNNWithMeans`

## Creating our ratings

As I mentioned earlier, we will be using mock-up data.
Using Pandas and Surprise it is really easy to shape our data, all we need to do is create a dictionary with 3 keys. Our items, our users and their ratings. Think of your dictionary as a table!

`| Item   | User     | Rating ||--------|----------|--------|| Book 1 | Dries    | 2.5    || Book 2 | Dries    | 4      || Book 1 | Alistair | 3      || Book 2 | Alistair | 4      || Book 1 | You      | 2      |`

If your table of ratings looks like this then our accompanying dictionary will be this:

Keep in mind that if you want reliable recommendations, big data is key. The more data you give to work with, the better the result will be!

## Creating and training

Now that we have our data in place, we need to create a dataset and train it. Luckily this is really easy using Pandas and Surprise!

The steps we need to do:

1. Create a dataframe
4. Build the set
5. Initiate our algorithm
6. Train our algorithm

Sounds complicated right? Have a look at this screenshot though. Thanks to advanced libraries we don’t even need to do any of the math 😉

## Let’s get recommended!

Once you have done all the setup, you are ready to make recommendations. Thanks to Surprise, you can create a new recommendation with a single line!

`prediction = algo.predict('user', 'item')`

Calling the predict function is everything you need to do. In our example, the prediction will be a float ranging from 1 to 5. This is because we specified our scale when creating the reader.

Do you want to revisit what we created today? The repository can be found on my Github!

# Good questions

Can I use a KNN with a binary rating?
First of all, a binary rating system uses 0 and 1. These can be ratings given by users or can be page visits. Something along the lines of “Did you like this article?” with a thumbsup and thumbsdown is a binary rating.

Yes, you can totally use a KNN for binary ratings!

Can I use a KNN with a negative ratings?