Original Source Here
In applied machine learning, tuning the machine learning model’s hyperparameters represent a lucrative opportunity to achieve the best performance as possible.
1.1. Parameters vs Hyperparameters
Let’s now define what are hyperparameters, but before doing that let’s consider the difference between a parameter and a hyperparameter.
A parameter can be considered to be intrinsic or internal to the model and can be obtained after the model has learned from the data. Examples of parameters are regression coefficients in linear regression, support vectors in support vector machines and weights in neural networks.
A hyperparameter can be considered to be extrinsic or external to the model and can be set arbitrarily by the practitioner. Examples of hyperparameters include the k in k-nearest neighbors, number of trees and maximum number of features in random forest, learning rate and momentum in neural networks, the C and gamma parameters in support vector machines.
1.2. Hyperparameter tuning
As there are no universal best hyperparameters to use for any given problem, hyperparameters are typically set to default values. However, the optimal set of hyperparameters can be obtained from manual empirical (trial-and-error) hyperparameter search or in an automated fashion via the use of optimization algorithm to maximize the fitness function.
Two common hyperparameter tuning methods include grid search and random search. As the name implies, a grid search entails the creation of a grid of possible hyperparameter values whereby models are iteratively built for all of these hyperparameter combinations in a brute force manner. In a random search, not all hyperparameter combinations are used, but instead each iteration makes use of a random hyperparameter combination.
Additionally, a stochastic optimization approach may also be applied for hyperparameter tuning which will automatically navigate the hyperparameter space in an algorithmic manner as a function of the loss function (i.e. the performance metrics) in order to monitor the model performance.
In this tutorial, we will be using the grid search approach.
Today, we’re not going to use the Iris dataset nor the Penguins dataset but instead we’re going to generate our very own synthetic dataset. However, if you would like to follow along and substitute with your own dataset that would be great!
2.1. Generating the Synthetic Dataset
2.2. Examine Dataset Dimension
Let’s now examine the dimension of the dataset
which should give the following output:
((200, 10), (200,))
(200, 10) is the dimension of the X variable and here we can see that there are 200 rows and 10 columns. As for
(200,), this is the dimension of the Y variable and this indicates that there are 200 rows and 1 column (no numerical value shown).
3. Data Splitting
3.1. Examining the Dimension of Training Set
Let’s now examine the dimension of the training set (the 80% subset).
which should give the following output:
((160, 10), (160,))
(160, 10) is the dimension of the X variable and here we can see that there are 160 rows and 10 columns. As for
(160,), this is the dimension of the Y variable and this indicates that there are 200 rows and 1 column (no numerical value shown).
3.2. Examining the Dimension of Training Set
Let’s now examine the dimension of the testing set (the 20% subset).
which should give the following output:
((40, 10), (40,))
(40, 10) is the dimension of the X variable and here we can see that there are 40 rows and 10 columns. As for
(40,), this is the dimension of the Y variable and this indicates that there are 40 rows and 1 column (no numerical value shown).
4. Building a Baseline Random Forest Model
Here, we will first start by building a baseline random forest model that will serve as a baseline for comparative purpose with the model using the optimal set of hyperparameters.
For the baseline model, we will set an arbitrary number for the 2 hyperparameters (e.g.
max_features) that we will also use in the next section for hyperparameter tuning.
4.1. Instantiating the Random Forest Model
We first start by importing the necessary libraries and assigning the random forest classifier to the rf variable.
4.2. Training the Random Forest Model
Now, we will be applying the random forest classifier to build the classification model using the
rf.fit() function on the training data (e.g.
After the model has been trained, the following output appears:
Afterwards, we can apply the trained model (
rf) for making predictions. In the example code above we apply the model to predict the training set (
X_test) and assign the predicted Y values to the
4.3. Evaluating the Model Performance
Let’s now evaluate the model performance. Here, we’re calculating 3 performance metrics consisting of Accuracy, Matthews Correlation Coefficient (MCC) and the Area Under the Receiver Operating Characteristic Curve (ROC AUC).
5. Hyperparameter Tuning
Now we will be performing the tuning of hyperparameters of the random forest model. The 2 hyperparameters that we will tune includes
max_features and the
5.1. The Code
It should be noted that some of the code shown below were adapted from scikit-learn.
5.2. Code Explanation
Firstly, we will import the necessary libraries.
GridSearchCV() function from scikit-learn will be used to perform the hyperparameter tuning. Particularly, is should be noted that the
GridSearchCV() function can perform the typical functions of a classifier such as
predict as well as
Secondly, we define variables that are necessary input to the
GridSearchCV() function and this included the range values of the 2 hyperparameters (
n_estimators_range), which is then assigned as a dictionary to the
Finally, the best parameters (
grid.best_params_) along with their corresponding metric (
grid.best_score_) are printed out.
5.3. Performance Metrics
The default performance metric for
GridSearchCV() is accuracy and in this example we’re going to use ROC AUC.
Just in case, you’re wondering what other performance metrics can be used, run the following command to find out:
This prints the following supported performance metrics:
Thus, in this example we are going to use ROC AUC and thus set the input argument
scoring = 'roc_auc' inside the
5.4. Results from Hyperparameter Tuning
Lines 14–15 from the code in section 5.1 prints the performance metrics as shown below:
which indicated that the optimal or best set of hyperparameters has a
max_features of 3 and an
n_estimators of 60 with an ROC AUC score of 0.93.
6. Data Visualization of Tuned Hyperparameters
Let’s first start by taking a look at the underlying data where we will later use it for data visualization. Results from hyperpameter tuning has been written out to
grid.cv_results_ whose contents are shown below as a dictionary data type shown below.
6.1. Preparing the DataFrame
Now, we’re going to selectively extract some data from
grid.cv_results_ to create a dataframe containing the 2 hyperparameter combinations along with their corresponding performance metric, which in this case is the ROC AUC. Particularly, the following code block allows the combining of 2 hyperparameters (
params) and the performance metric (
The output is the following dataframe with the 3 columns consisting of
6.2. Reshaping the DataFrame
6.2.1. Grouping the columns
In order to visualize the above dataframe as a contour plot (i.e. either 2D or 3D version), we will first need to reshape the data structure.
In the above code, we’re using the
groupby() function from the
pandas library to literally group the dataframe according to 2 columns (
n_estimators) whereby the contents of the first column (
max_features) are merged.
6.2.2. Pivoting the data
Data is reshaped by pivoting the data into an m ⨯ n matrix where rows and columns correspond to the
The above code block produces the following reshaped dataframe.
Finally, we assign the reshaped data to the respective
z variables that will then be used for making the contour plot.
6.3. Making the 2D Contour Plot
Now, comes the fun part, we will be visualizing the landscape of the 2 hyperparameters that we are tuning and their influence on the ROC AUC score by making the 2D contour plot using Plotly. The aforementioned
z variables are used as the input data.
The above code block generates the following 2D contour plot.
6.4. Making the 3D Contour Plot
Here, we’re going to use Plotly for creating an interactive 3D contour plot using the
z variables as the input data.
The above code block generates the following 3D contour plot.
Congratulations! You have just performed hyperparameter tuning as well as create data visualizations to go along with it. Hopefully, you’ll be able to boost your model performance as compare to those achieved by the default values.
What’s next? In this tutorial, you have explored the tuning of 2 hyperparameters but that’s not all. There are several other hyperparameters that you could tune for random forest model. You can check out the API from
scikit-learn for a list of hyperparameters to try out.
Or perhaps you can try tuning hyperparameters for other machine learning algorithms by using the code described in this article as a starting template.
Let me know in the comments, what fun projects are you working on!
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot