Original Source Here
In this article, we will discuss how to create much efficient and better-formatted decision tree visualization using the dtreeviz library.
dtreeviz is an open-source Python library used to visualize the decisions or rules of a decision tree model. Install the library from PyPl using
pip install dtreeviz and import it as
from dtreeviz.trees import * . dtreeviz library can visualize the decision tree for both classification and regression tasks.
The below-mentioned code snippet can be used to create an instance of the dtreeviz function and plot the visualization for a decision tree classifier model trained on the Iris dataset.
viz = dtreeviz(clf,
title="Decision Tree - Iris Data")
The decision tree generated from the dtreeviz library is better formatted and interpretable. For each node of the above plot, we see a stacked histogram of the feature that is used to split at that level, color-coded according to the class. Observing from the histogram one can observe how the split is occurring. For example, for the first node, the split is at
petal_length=2.45 where the records having
petal_length<=2.45 are predicted as setosa, and
petal_length>2.45 we have got the tree extended.
The above plots were for the classification dataset, we can also plot decision tree visualization for the regression decision tree model. The Boston housing dataset is used to demonstrate the regression decision tree model. The code snippet below shows how to read the data and train the decision tree regressor model.
dtreeviz can also visualize a decision tree regressor model. The below-mentioned code snippet can be used to visualize a decision tree regressor trained on the Boston Housing dataset.
viz = dtreeviz(reg,
title="Decision Tree - Boston housing",
show_node_labels = True)
For regression decision tree plots, at each node, we have a scatterplot between the target class and the feature that is used to split at that level. One can interpret the model by observing the dashed line in the scatterplots.
- The vertical lines in the scatterplot denote the split point at that level (same as histogram split from classification).
- The horizontal dashed line in the scatterplot are the target means for left and right decision nodes.
Some Interesting features of dtreeviz:
- Orientation: By default, the decision tree plots are from top to bottom, one can change it to the left to right orientation using the
orientation=’LR’parameter from the dtreeviz function.
- Remove Scatterplots or Histograms: For a decision trees model with large depths, the presence of scatterplots or histograms can make the plot very large. One can avoid those things from their plots using the
fancy=Falseparameter from the dtreeviz function.
- Feature Importance: To get the feature importance plot, one can use the
This article discussed how dtreeviz can be used to plot decision tree classification and regression models. Data Scientists / Analysts can use it to get an understanding of their decision tree models by observing the set of rules that cause the prediction.
There are certain limitations such as interpreting a decision tree with large depth is very difficult. Also, the decision tree generates only SVG plots with reduced dependencies.
 Dtreeviz GitHub repository: https://github.com/parrt/dtreeviz
Thank You for Reading
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot