Kaggler’s Guide to LightGBM Hyperparameter Tuning with Optuna in 2021

Original Source Here

Hyperparameters that control the tree structure

If you are not familiar with decision trees, check out this legendary video by StatQuest.

In LGBM, the most important parameter to control the tree structure is num_leaves. As the name suggests, it controls the number of decision leaves in a single tree. The decision leaf of a tree is the node where the ‘actual decision’ happens.

The next is max_depth. The higher max_depth, the more levels the tree has, which makes it more complex and prone to overfit. Too low, and you will underfit. Even though it sounds hard, it is the easiest parameter to tune — just choose a value between 3 and 12 (this range tends to work well on Kaggle for any dataset).

Tuning num_leaves can also be easy once you determine max_depth. There is a simple formula given in LGBM documentation – the maximum limit to num_leaves should be 2^(max_depth). This means the optimal value for num_leaves lies within the range (2^3, 2^12) or (8, 4096).

However, num_leaves impacts the learning in LGBM more than max_depth. This means you need to specify a more conservative search range like (20, 3000) – that’s what I mostly do.

Another important structural parameter for a tree is min_data_in_leaf. Its magnitude is also correlated to whether you overfit or not. In simple terms, min_data_in_leaf specifies the minimum number of observations that fit the decision criteria in a leaf.

For example, if the decision leaf checks whether one feature is greater than, let’s say, 13 — setting min_data_in_leaf to 100 means we want to evaluate this leaf only if at least 100 training observations are bigger than 13. This is the gist in my lay terms.

The optimal value for min_data_in_leaf depends on the number of training samples and num_leaves. For large datasets, set a value in hundreds or thousands.

Check out this section of the LGBM documentation for more details.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: