Original Source Here
To showcase each of the methods and walk through the interpretation of the plots, I am using the ‘Abalone’ dataset.
This data set is licensed under the Open Data Commons Public Domain Dedication and License (PDDL).
The goal of this dataset is to determine the age of abalone. Usually, this is done via cutting through the cone, staining it, and counting the number of rings with a microscope. Unfortunately, measuring age in this way is very time-consuming, invasive, and laborious.
Instead, using eight features, the age can be predicted effectively. The features are sex, length, diameter, height, whole weight, shucked weight, viscera weight, and shell weight. This data has 4177 records and eight features.
Both PDP and ICE plots can be generated from the same package in sklearn.
from sklearn.inspection import plot_partial_dependence# PD Plots
plot_partial_dependence(model, X, [feature_name])
The dependence plots are based on a gradient boosting-tree regressor model.
In the plots, the behaviour of the model for both the ‘Shucked weight’ and the ‘Whole weight’ is shown. The age prediction of the abalone decreases when the Shucked weight increases. Whereas for ‘Whole weight’, an increase for this feature increases the predicted age, but for low values, the prediction doesn’t change.
Also, the percentiles of the feature values are shown at the bottom of each plot. Each significant black mark represents the percentile of the feature in increments of 10.
# ICE Plots
plot_partial_dependence(model, X, [feature_name], kind='individual')
Next are the PD plot and the ICE plot for height. In the PD plot, the trend is no longer linear as the value increases. Once the height is beyond 0.3, the height no longer contributes to the overall age prediction (specified with the partial dependence of 0).
In the ICE plot, this pattern is consistent among all of the instances. With a low height, the predicted age decreased. However, when height increases, the prediction increases until the height passes a threshold.
The distribution of height causes this strange behaviour. For reference, the maximum value is 1.13, but at the 75th percentile, the value is only 0.165.
The plots attempt to model across the range of values for each feature. But when most feature values are concentrated to a subset of values, the model does not learn this range well.
Therefore, we can not be confident when a new instance has a value for height in this range.
from sklearn.inspection import plot_partial_dependence# PD Plots, 2-dimensional
plot_partial_dependence(model, X, [(feature_1, feature_2)], kind='both')
The final partial dependence plot observes two features interacting towards the final output. Here the ‘Whole weight’ and the ‘Shucked weight’ are shown.
The change in values to the final output is shown with the colours of the contours, and the previous behaviours of each feature are still present.
An increase in the ‘Whole weight’ increases the predicted age once beyond the threshold previously defined. And for the ‘Shucked weight’, the prediction decreases as the value increases. So for these two features, there is not much more interaction.
Partial dependence plots and individual conditional expectation plots are a great way to start to understand your models. In addition, they are easy to implement and apply to every model.
Both plots allow the user to understand better the tendencies of their model and exposure inconsistencies and irregular model behaviour.
These plots are dependent on the underlying model used in the supervised learning task. However, they reveal model behaviours overall and at the individual instance level. Thus, they allow you to interpret your models effectively.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot