Handling Continuous Attributes in Decision Trees


Original Source Here

MLShot of the Day — Brush up ML concepts in less than 5 mins

Handling Continuous Attributes in Decision Trees

Discretization of Continuous Attributes for training optimal Decision trees


A Crash Course on Decision Trees and Splitting Measures:

  • Decision Trees and its variants, Random Forests, XGBoost, CatBoost are popularly used in Machine Learning competitions.
  • Training a Decision Tree for a classification problem involves recursively splitting the data into smaller subsets until each node contains data belonging to a single class.
  • Different measures (Information Gain, Gini Index, Gain ratio) are used for determining the best possible split at each node of the decision tree.

Splitting Measures for growing Decision Trees:

  • Recursively growing a tree involves selecting an attribute and a test condition that divides the data at a given node into smaller but pure subsets.
  • The measures used for determining the best split computes the degree of impurity of the child nodes.
  • Computing the impurity of child nodes with respect to that of parent nodes is called Gain. Higher the Gain (G), the better the split.
  • Let pₖ be the proportion of records belonging to class k at a given node. The impurity measures are given by :
Image by the Author
Image by the Author

The curious case of Continuous Attributes:

It can be seen that the computation of splitting measures assumes finite (read: discrete) attribute values. This begs the question, How are continuous-valued attributes handled in decision trees?

Take some time to think about it (Not long though..its an ML shot)

The test condition for continuous-valued attributes can either be expressed using a comparison operator (≥, ≤). Alternatively, the continuous-valued attribute can be split into a finite set of range buckets. It is important to note that a comparison-based test condition gives us a binary split whereas range buckets give us a multiway split.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: