https://miro.medium.com/max/1200/0*5by_0CHnmg15IrbK

Original Source Here

# 1. PyOD

PyOD or Python Outlier Detection is a python package toolkit for detecting outlier data. PyOD package boasts 30 outlier detection algorithms, ranging from the classic to the most latest—proof PyOD package is well maintained. Examples of the outlier detection model include:

- Angle-Based Outlier Detection
- Cluster-Based Local Outlier Factor
- Principal Component Analysis Outlier Detection
- Variational Auto Encoder

and many more. If you are interested to see all the available methods, you should visit the following page.

PyOD makes outlier detection simple and intuitive by using fewer lines of code to predict the outlier data. Like model training, PyOD uses the classifier model to train the data and predict the outlier based on the model. Let’s try the package with code examples. First, we need to install the package.

`pip install pyod`

After installing the package, let’s try to load a sample dataset. I would use the tips data from the seaborn package.

import seaborn as sns

import pandas as pddf = sns.load_dataset('tips')

df.head()

Let’s say we want to find the multivariate outlier between total_bill and tip. We might sense the data spread if we visualize the scatter plot between these two features.

`sns.scatterplot(data = df, x = 'total_bill', y = 'tip')`

If we see the plot above, we notice some data is located on the top right corner, indicating an outlier. But, what is the limit if we want to classify the data to inlier and outlier? We could use PyOD to help us do the job in this case.

For our example, I would only use two methods — Angle-Based Outlier Detection (ABOD) and Cluster-Based Local Outlier Factor (CBLOF).

`from pyod.models.abod import ABOD`

from pyod.models.cblof import CBLOF

Let’s start with the ABOD model; we need to set the contamination parameter or the fraction number of outliers detected from our data. If I set the contamination to 0.05, I want to detect 5% of outliers from our data. Let’s try it with our code.

`abod_clf = ABOD(contamination=outliers_fraction)`

abod_clf.fit(df[['total_bill', 'tip']]))

We fit the data we want to detect the outlier. Similar to the model classifier, we could access the score/label and predict using this classifier.

`#Return the classified inlier/outlier`

abod_clf.labels_

You could also access the decision score or the probability, but let’s move on with the other model and compare the result.

cblof_clf = CBLOF(contamination=0.05,check_estimator=False, random_state=random_state)

cblof_clf.fit(df[['total_bill', 'tip']])df['ABOD_Clf'] = abod_clf.labels_

df['CBLOF_Clf'] = cblof_clf.labels_

We store the result on the data frame to compare both detection algorithms.

`sns.scatterplot(data = df, x = 'total_bill', y = 'tip', hue = 'ABOD_Clf')`

From the ABOD outlier detection result, we could see that the extreme part of the data from the center is considered an outlier. Let’s see from the CBLOF model.

`sns.scatterplot(data = df, x = 'total_bill', y = 'tip', hue = 'CBLOF_Clf')`

Different from ABOD, the CBLOF algorithm classified the outer part as on the one side (right-side). You could try another algorithm to detect the outlier from the data if you want.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot