Original Source Here

# 1. Hyperparameters

In applied machine learning, tuning the machine learning model’s hyperparameters represent a lucrative opportunity to achieve the best performance as possible.

## 1.1. Parameters vs Hyperparameters

Let’s now define what are hyperparameters, but before doing that let’s consider the difference between a *parameter *and a *hyperparameter*.

A parameter can be considered to be intrinsic or internal to the model and can be obtained after the model has learned from the data. Examples of parameters are regression coefficients in linear regression, support vectors in support vector machines and weights in neural networks.

A hyperparameter can be considered to be extrinsic or external to the model and can be set arbitrarily by the practitioner. Examples of hyperparameters include the k in k-nearest neighbors, number of trees and maximum number of features in random forest, learning rate and momentum in neural networks, the C and gamma parameters in support vector machines.

## 1.2. Hyperparameter tuning

As there are no universal best hyperparameters to use for any given problem, hyperparameters are typically set to default values. However, the optimal set of hyperparameters can be obtained from manual empirical (trial-and-error) hyperparameter search or in an automated fashion via the use of optimization algorithm to maximize the fitness function.

Two common hyperparameter tuning methods include *grid search* and *random search*. As the name implies, a ** grid search** entails the creation of a grid of possible hyperparameter values whereby models are iteratively built for all of these hyperparameter combinations in a brute force manner. In a

**, not all hyperparameter combinations are used, but instead each iteration makes use of a random hyperparameter combination.**

*random search*Additionally, a stochastic optimization approach may also be applied for hyperparameter tuning which will automatically navigate the hyperparameter space in an algorithmic manner as a function of the *loss function* (i.e. the performance metrics) in order to monitor the model performance.

In this tutorial, we will be using the grid search approach.

# 2. Dataset

Today, we’re not going to use the Iris dataset nor the Penguins dataset but instead we’re going to generate our very own synthetic dataset. However, if you would like to follow along and substitute with your own dataset that would be great!

## 2.1. Generating the Synthetic Dataset

## 2.2. Examine Dataset Dimension

Let’s now examine the dimension of the dataset

which should give the following output:

`((200, 10), (200,))`

where `(200, 10)`

is the dimension of the X variable and here we can see that there are 200 rows and 10 columns. As for `(200,)`

, this is the dimension of the Y variable and this indicates that there are 200 rows and 1 column (no numerical value shown).

# 3. Data Splitting

## 3.1. Examining the Dimension of Training Set

Let’s now examine the dimension of the **training set **(the 80% subset).

which should give the following output:

`((160, 10), (160,))`

where `(160, 10)`

is the dimension of the X variable and here we can see that there are 160 rows and 10 columns. As for `(160,)`

, this is the dimension of the Y variable and this indicates that there are 200 rows and 1 column (no numerical value shown).

## 3.2. Examining the Dimension of Training Set

Let’s now examine the dimension of the **testing set **(the 20% subset).

which should give the following output:

`((40, 10), (40,))`

where `(40, 10)`

is the dimension of the X variable and here we can see that there are 40 rows and 10 columns. As for `(40,)`

, this is the dimension of the Y variable and this indicates that there are 40 rows and 1 column (no numerical value shown).

# 4. Building a Baseline Random Forest Model

Here, we will first start by building a baseline random forest model that will serve as a baseline for comparative purpose with the model using the optimal set of hyperparameters.

For the baseline model, we will set an arbitrary number for the 2 hyperparameters (e.g. `n_estimators`

and `max_features`

) that we will also use in the next section for hyperparameter tuning.

## 4.1. Instantiating the Random Forest Model

We first start by importing the necessary libraries and assigning the random forest classifier to the **rf** variable.

## 4.2. Training the Random Forest Model

Now, we will be applying the random forest classifier to build the classification model using the `rf.fit()`

function on the training data (e.g. `X_train`

and `Y_train`

).

After the model has been trained, the following output appears:

Afterwards, we can apply the trained model (`rf`

) for making predictions. In the example code above we apply the model to predict the training set (`X_test`

) and assign the predicted Y values to the `Y_pred`

variable.

## 4.3. Evaluating the Model Performance

Let’s now evaluate the model performance. Here, we’re calculating 3 performance metrics consisting of Accuracy, Matthews Correlation Coefficient (MCC) and the Area Under the Receiver Operating Characteristic Curve (ROC AUC).

# 5. Hyperparameter Tuning

Now we will be performing the tuning of hyperparameters of the random forest model. The 2 hyperparameters that we will tune includes `max_features`

and the `n_estimators`

.

## 5.1. The Code

It should be noted that some of the code shown below were adapted from scikit-learn.

## 5.2. Code Explanation

Firstly, we will import the necessary libraries.

The `GridSearchCV()`

function from scikit-learn will be used to perform the hyperparameter tuning. Particularly, is should be noted that the `GridSearchCV()`

function can perform the typical functions of a classifier such as `fit`

, `score`

and `predict`

as well as `predict_proba`

, `decision_function`

, `transform`

and `inverse_transform`

.

Secondly, we define variables that are necessary input to the `GridSearchCV()`

function and this included the range values of the 2 hyperparameters (`max_features_range`

and `n_estimators_range`

), which is then assigned as a dictionary to the `param_grid`

variable.

Finally, the best parameters (`grid.best_params_`

) along with their corresponding metric (`grid.best_score_`

) are printed out.

## 5.3. Performance Metrics

The default performance metric for `GridSearchCV()`

is accuracy and in this example we’re going to use ROC AUC.

Just in case, you’re wondering what other performance metrics can be used, run the following command to find out:

This prints the following supported performance metrics:

Thus, in this example we are going to use ROC AUC and thus set the input argument `scoring = 'roc_auc'`

inside the `GridSearchCV()`

function.

## 5.4. Results from Hyperparameter Tuning

Lines 14–15 from the code in section 5.1 prints the performance metrics as shown below:

which indicated that the optimal or best set of hyperparameters has a `max_features`

of 3 and an `n_estimators`

of 60 with an ROC AUC score of 0.93.

# 6. Data Visualization of Tuned Hyperparameters

Let’s first start by taking a look at the underlying data where we will later use it for data visualization. Results from hyperpameter tuning has been written out to `grid.cv_results_`

whose contents are shown below as a dictionary data type shown below.

## 6.1. Preparing the DataFrame

Now, we’re going to selectively extract some data from `grid.cv_results_`

to create a dataframe containing the 2 hyperparameter combinations along with their corresponding performance metric, which in this case is the ROC AUC. Particularly, the following code block allows the combining of 2 hyperparameters (`params`

) and the performance metric (`mean_test_score`

).

The output is the following dataframe with the 3 columns consisting of `max_features`

, `n_estimators`

and `ROC_AUC`

.

## 6.2. Reshaping the DataFrame

*6.2.1. Grouping the columns*

In order to visualize the above dataframe as a contour plot (i.e. either 2D or 3D version), we will first need to reshape the data structure.

In the above code, we’re using the `groupby()`

function from the `pandas`

library to literally group the dataframe according to 2 columns (`max_features`

and `n_estimators`

) whereby the contents of the first column (`max_features`

) are merged.

*6.2.2. Pivoting the data*

Data is reshaped by pivoting the data into an m ⨯ n matrix where rows and columns correspond to the `max_features`

and `n_estimators`

, respectively.

The above code block produces the following reshaped dataframe.

Finally, we assign the reshaped data to the respective `x`

,` y`

and `z`

variables that will then be used for making the contour plot.

## 6.3. Making the 2D Contour Plot

Now, comes the fun part, we will be visualizing the landscape of the 2 hyperparameters that we are tuning and their influence on the ROC AUC score by making the 2D contour plot using Plotly. The aforementioned `x`

,` y`

and `z`

variables are used as the input data.

The above code block generates the following 2D contour plot.

## 6.4. Making the 3D Contour Plot

Here, we’re going to use Plotly for creating an interactive 3D contour plot using the `x`

, `y`

and `z`

variables as the input data.

The above code block generates the following 3D contour plot.

# Conclusion

Congratulations! You have just performed hyperparameter tuning as well as create data visualizations to go along with it. Hopefully, you’ll be able to boost your model performance as compare to those achieved by the default values.

What’s next? In this tutorial, you have explored the tuning of 2 hyperparameters but that’s not all. There are several other hyperparameters that you could tune for random forest model. You can check out the API from `scikit-learn`

for a list of hyperparameters to try out.

Or perhaps you can try tuning hyperparameters for other machine learning algorithms by using the code described in this article as a starting template.

Let me know in the comments, what fun projects are you working on!

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot