Predictive Modeling Using Sklearn

https://miro.medium.com/max/1200/0*q9scLenC94WVG8rk

Original Source Here

Predictive Modeling Using Sklearn

Using Imodels for Creating Concise Transparent and Accurate Predictive Models

Photo by h heyerlein on Unsplash

Sklearn is a Python library that contains multiple Machine Learning models. These models can be used to solve problems like Classification and Regression. Multiple algorithms like Naive Bayes, SVM, Decision Tree, etc can be easily accessed and be used for solving multiple problems.

With the increasing complexity of these models, it becomes difficult to choose the best model for our problem that can also be interpreted easily. Choosing a model depends on multiple factors like higher predictive accuracy, lower variance, low bias, etc. Interpreting a Machine learning model is important in order to understand how the model is actually generating predictions.

What if I tell you that you can fit and predict multiple Machine learning models in just a few lines of code? Wouldn’t that be awesome because we can reduce the human effort and compare all the models and their corresponding factors to select the best model?

Imodels is an open-source python library that is used as an interface for fitting and using multiple Sckit-learn compatible models that are simpler and improve the interpretability and computational efficiency of the model. It contains models like boosted ruleset, slipper ruleset, etc.

In this article, we will use Imodels with Auto gluon to create multiple Machine Learning models and compare them.

Let’s get started…

Installing required libraries

We will start by installing Imodels and Autogluon using pip. The command given below will do that.

!pip install imodels
!pip install autogluon

Importing required libraries

In this step, we will import the required libraries and functions to create and compare the Machine Learning models.

%load_ext autoreload
%autoreload 2
import numpy as np
import pandas as pd
from autogluon.tabular import TabularDataset, TabularPredictor

Loading the Dataset

We will use the dataset from autogluon which can be easily downloaded from the link given below.

https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv

The code given below will load the training and the test dataset.

# train data
train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
subsample_size = 100 # subsample subset of data for faster demo, try setting this to much larger values
train_data = train_data.sample(n=subsample_size, random_state=0)
# test data
test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')
label = 'class'
y_test = test_data[label]
test_data_nolab = test_data.drop(columns=[label])
train_data.head()
Dataset(Source: By Author)

Creating the Model & Analyzing

In this step, we will create the models using models and fit these models in just a single line of code.

predictor = TabularPredictor(label=label).fit(train_data, time_limit=8, verbosity=2)
Models(Source: By Author)

Now we have created the models, let us analyze their factors.

predictor.leaderboard(silent=True)
Leaderboard(Source: By Author)

In this leaderboard, we can analyze the factors of different models and choose the best model for our dataset. These factors include Score, Fit Time, Prediction Time, etc.

Now let us see the best model from all these models.

predictor.get_model_best()
Best Model(Source: By Author)

Now that we have the leaderboard of all models and also the best model for our dataset, let us see the feature importance in our dataset.

predictor.feature_importance(train_data)
Feature Importance(Source: By Author)

Here we can see that the most important feature is age followed by relationship and so on. Here you can see that this also shows the Standard Deviation and several other features also.

The final step is to save the model, we can do this in just a single line of code.

predictor.save()

This is how we can use Imodels for creating different models and comparing them.

Go ahead try this with different datasets and create different models. In case you find any difficulty please let me know in the response section.

This article is in collaboration with Piyush Ingale.

Before You Go

Thanks for reading! If you want to get in touch with me, feel free to reach me at hmix13@gmail.com or my LinkedIn Profile. You can view my Github profile for different data science projects and packages tutorials. Also, feel free to explore my profile and read different articles I have written related to Data Science.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: