Original Source Here
Step 3. k-Nearest Neighbor Classifier
In the libraries step, we have already imported the KNN classifier module, so all we have to do is use it on our data set. This step is a nice exercise of using a ready sklearn module in a project. Since we are doing supervised learning, the data set has to be labeled. This means when training the data, we are also teaching the outcomes.
k-nearest neighbor algorithm
“The k-nearest neighbor algorithm (k-NN) is a non-parametric method proposed by Thomas Cover used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression.” — Wikipedia
Before moving to the implementation of this algorithm, I would like to share an article by Amit Chauhan where you can find out how machine learning algorithms can be visualized and modeled: “Perform XGBoost, KNN Modeling With Dimension Reduction Technique.” He is also using the MNIST data set, which is the same data set we are using in this project.
Features and target variables
The digits data that we have imported from sklearn data sets has two attributes: data and target. We will start by assigning these parts to our new variables. Let’s call our features (our data) X and the labels (our target) y:
X = digits.data
y = digits.target
Split the data
Next, we will use the
train_test_split method to split our data part. Instead of training the whole data, it’s better to break it into training and testing data to review our models’ accuracy. This will make more sense in the next step, where we will see how to improve the predictions using some methods.
#test size is the ratio that will be the test data, and the rest will be train dataX_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=42, stratify=y)
Define the classifier
knn = KNeighborsClassifier(n_neighbors = 7)
Fit the model
Let me show you how this score is calculated. First, we are making a prediction using the KNN model on the
X_test features and then comparing it with the actual labels, which is the
y_test. Here is how the accuracy is actually calculated in the background:
y_pred = knn.predict(X_test)number_of_equal_elements = np.sum(y_pred==y_test)
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot