Original Source Here
Hyperparameters that control the tree structure
If you are not familiar with decision trees, check out this legendary video by StatQuest.
In LGBM, the most important parameter to control the tree structure is
num_leaves. As the name suggests, it controls the number of decision leaves in a single tree. The decision leaf of a tree is the node where the ‘actual decision’ happens.
The next is
max_depth. The higher
max_depth, the more levels the tree has, which makes it more complex and prone to overfit. Too low, and you will underfit. Even though it sounds hard, it is the easiest parameter to tune — just choose a value between 3 and 12 (this range tends to work well on Kaggle for any dataset).
num_leaves can also be easy once you determine
max_depth. There is a simple formula given in LGBM documentation – the maximum limit to
num_leaves should be
2^(max_depth). This means the optimal value for
num_leaves lies within the range (2^3, 2^12) or (8, 4096).
num_leaves impacts the learning in LGBM more than
max_depth. This means you need to specify a more conservative search range like (20, 3000) – that’s what I mostly do.
Another important structural parameter for a tree is
min_data_in_leaf. Its magnitude is also correlated to whether you overfit or not. In simple terms,
min_data_in_leaf specifies the minimum number of observations that fit the decision criteria in a leaf.
For example, if the decision leaf checks whether one feature is greater than, let’s say, 13 — setting
min_data_in_leaf to 100 means we want to evaluate this leaf only if at least 100 training observations are bigger than 13. This is the gist in my lay terms.
The optimal value for
min_data_in_leaf depends on the number of training samples and
num_leaves. For large datasets, set a value in hundreds or thousands.
Check out this section of the LGBM documentation for more details.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot