Digit Recognition Using Scikit-learn Deep Learning Model

Original Source Here

Step 3. k-Nearest Neighbor Classifier

In the libraries step, we have already imported the KNN classifier module, so all we have to do is use it on our data set. This step is a nice exercise of using a ready sklearn module in a project. Since we are doing supervised learning, the data set has to be labeled. This means when training the data, we are also teaching the outcomes.

k-nearest neighbor algorithm

“The k-nearest neighbor algorithm (k-NN) is a non-parametric method proposed by Thomas Cover used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression.” — Wikipedia

Before moving to the implementation of this algorithm, I would like to share an article by Amit Chauhan where you can find out how machine learning algorithms can be visualized and modeled: “Perform XGBoost, KNN Modeling With Dimension Reduction Technique.” He is also using the MNIST data set, which is the same data set we are using in this project.

Features and target variables

The digits data that we have imported from sklearn data sets has two attributes: data and target. We will start by assigning these parts to our new variables. Let’s call our features (our data) X and the labels (our target) y:

X = digits.data 
y = digits.target

Split the data

Next, we will use the train_test_split method to split our data part. Instead of training the whole data, it’s better to break it into training and testing data to review our models’ accuracy. This will make more sense in the next step, where we will see how to improve the predictions using some methods.

#test size is the ratio that will be the test data, and the rest will be train dataX_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=42, stratify=y)

Define the classifier

knn = KNeighborsClassifier(n_neighbors = 7)

Fit the model

knn.fit(X_train, y_train)

Accuracy score

print(knn.score(X_test, y_test))
Image by author

Let me show you how this score is calculated. First, we are making a prediction using the KNN model on the X_test features and then comparing it with the actual labels, which is the y_test. Here is how the accuracy is actually calculated in the background:

y_pred = knn.predict(X_test)number_of_equal_elements = np.sum(y_pred==y_test)
Image by author


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: