https://miro.medium.com/max/1000/0*fAxbbirJS3zs7zix

Original Source Here

## MLShot of the Day — Brush up ML concepts in less than 5 mins

# Handling Continuous Attributes in Decision Trees

## Discretization of Continuous Attributes for training optimal Decision trees

## A Crash Course on Decision Trees and Splitting Measures:

- Decision Trees and its variants, Random Forests, XGBoost, CatBoost are popularly used in Machine Learning competitions.
- Training a Decision Tree for a
**classification problem**involves recursively splitting the data into smaller subsets until each node contains data belonging to a single class. - Different measures (Information Gain, Gini Index, Gain ratio) are used for determining the best possible split at each node of the decision tree.

## Splitting Measures for growing Decision Trees:

- Recursively growing a tree involves selecting an
**attribute**and**a test condition**that divides the data at a given node into**smaller but pure subsets.** - The measures used for determining the best split computes the degree of impurity of the child nodes.
- Computing the impurity of child nodes with respect to that of parent nodes is called Gain.
**Higher the Gain (G), the better the split.** - Let pₖ be the proportion of records belonging to class k at a given node. The impurity measures are given by :

## The curious case of Continuous Attributes:

It can be seen that the computation of splitting measures assumes finite (read: discrete) attribute values. This begs the question, **How are continuous-valued attributes handled in decision trees?**

## Take some time to think about it (Not long though..its an ML shot)

The test condition for continuous-valued attributes can either be expressed using a **comparison operator** (≥, ≤). Alternatively, the continuous-valued attribute can be split into a **finite set of range buckets**. It is important to note that a comparison-based test condition gives us a **binary split** whereas range buckets give us a **multiway split**.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot