Mastering Machine Learning Fundamentals using Python

https://miro.medium.com/v2/resize:fit:1200/0*g4uzi2JfQf8qrzIs

Original Source Here

1. Classification

Classification is a supervised learning technique that involves the use of labeled data to train a model to make predictions on unseen or future instances. In supervised learning, the dataset consists of input features along with their corresponding class or label. The classification algorithm analyzes the relationship between the input features and the known classes to build a model that can generalize and make accurate predictions on new, unseen data.

Nearest Neighbors Classification (source scikit-learn)

Our goal is to build a classification model that can predict the species of an Iris flower (discrete value) based on its measurements.

Photo by Pawel Czerwinski on Unsplash

First, we will load the Iris dataset, which is conveniently available in scikit-learn. The dataset is split into two arrays: one containing the feature values (X) and the other containing the corresponding target labels (y).

The dataset consists of measurements of four features (sepal length, sepal width, petal length, and petal width) for three different species of Iris flowers (setosa, versicolor, and virginica).

Next, we will split the dataset into training and testing sets using the train_test_split function from scikit-learn. This step ensures that we have separate data for training and evaluating our classification model.

To perform the classification, we will use the k-nearest neighbors (KNN) classifier. KNN is a simple yet effective algorithm that classifies new data points based on the majority vote of their k nearest neighbors in the training set. We will initialize the KNN classifier, fit it to the training data, and make predictions on the testing data.

To evaluate the quality of the model’s predictions we use the balanced accuracy score, which is a metric that takes into account the imbalance in the number of samples across different classes. It provides a fair assessment of the classifier’s performance in scenarios where the classes are not equally represented.

By calculating the balanced accuracy score, we can assess how well our classification model performs on the testing set. The higher the score, the better the model’s ability to correctly classify the Iris flowers.

Once we have chosen the model, we can also make predictions on new, unseen Iris flowers by providing their feature values to the classifier’s predict method. The model will assign a predicted species label to each new data point based on its measurements.

Code:

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import balanced_accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train a k-nearest neighbors classifier
clf = KNeighborsClassifier()
clf.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = clf.predict(X_test)

# Evaluate the model using balanced accuracy score
accuracy = balanced_accuracy_score(y_test, y_pred)
print("Balanced Accuracy: ", accuracy)

# Make predictions on new data
new_data = [[5.1, 3.5, 1.4, 0.2], [6.7, 3.0, 5.2, 2.3]] # Example new data
predicted_labels = clf.predict(new_data)

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: