How to Beat the Heck Out of XGBoost with LightGBM: Comprehensive Tutorial

Original Source Here

XGBoost vs. LightGBM

When LGBM got released, it came with ground-breaking changes to the way it grows decision trees.

Both XGBoost and LightGBM are ensebmle algorithms. They use a special type of decision trees, also called weak learners, to capture complex, non-linear patterns.

In XGBoost (and many other libraries), decision trees were built one level at a time:

Image from LGBM documentation

This type of structure tends to result in unnecessary nodes and leaves because the trees continued to build until the max_depth reached. This led to higher model complexity and training cost runtime.

In contrast, LightGBM takes a leaf-wise approach:

Image from LGBM documentation

The structure continues to grow with the most promising branches and leaves (nodes with the most delta loss), holding the number of the decision leaves constant. (If this doesn’t make sense to you, don’t sweat. This won’t prevent you from effectively using LGBM).

This is one of the main reasons LGBM crushed XGBoost in terms of speed when it first came out.

Image from LGBM documentation

Above is a benchmark comparison of XGBoost with traditional decision trees and LGBM with leaf-wise structure (first and last columns) on datasets with ~500k-13M samples. It shows that LGBM is orders of magnitude faster than XGB.

LGBM also uses histogram binning of continuous features, which provides even more speed-up than traditional gradient boosting. Binning numeric values significantly decrease the number of split points to consider in decision trees, and they remove the need to use sorting algorithms, which are always computation-heavy.

Inspired by LGBM, XGBoost also introduced histogram-binning, which gave massive speed-up but still not enough to match LGBM’s:

Image from LGBM documentationHistogram-binning comparison — second and third columns.

We will continue exploring the differences in the coming sections.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: