# Handling Continuous Attributes in Decision Trees

https://miro.medium.com/max/1000/0*fAxbbirJS3zs7zix

Original Source Here

# Handling Continuous Attributes in Decision Trees

## A Crash Course on Decision Trees and Splitting Measures:

• Decision Trees and its variants, Random Forests, XGBoost, CatBoost are popularly used in Machine Learning competitions.
• Training a Decision Tree for a classification problem involves recursively splitting the data into smaller subsets until each node contains data belonging to a single class.
• Different measures (Information Gain, Gini Index, Gain ratio) are used for determining the best possible split at each node of the decision tree.

## Splitting Measures for growing Decision Trees:

• Recursively growing a tree involves selecting an attribute and a test condition that divides the data at a given node into smaller but pure subsets.
• The measures used for determining the best split computes the degree of impurity of the child nodes.
• Computing the impurity of child nodes with respect to that of parent nodes is called Gain. Higher the Gain (G), the better the split.
• Let pₖ be the proportion of records belonging to class k at a given node. The impurity measures are given by :

## The curious case of Continuous Attributes:

It can be seen that the computation of splitting measures assumes finite (read: discrete) attribute values. This begs the question, How are continuous-valued attributes handled in decision trees?

## Take some time to think about it (Not long though..its an ML shot)

The test condition for continuous-valued attributes can either be expressed using a comparison operator (≥, ≤). Alternatively, the continuous-valued attribute can be split into a finite set of range buckets. It is important to note that a comparison-based test condition gives us a binary split whereas range buckets give us a multiway split.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot