Original Source Here
Three Crazily Simple Recipes to Fight Overfitting in Deep Learning Models
These three basic ideas should be put in place in any machine learning modeling experiment.
I recently started an AI-focused educational newsletter, that already has over 80,000 subscribers. TheSequence is a no-BS (meaning no hype, no news etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:
Overfitting is considered one of the biggest challenges in modern deep learning applications. Conceptually, overfitting occurs when a model generates a hypothesis that is too tailored to a specific dataset to the data making it impossible to adapt to new datasets. A useful analogy to understand overfitting is to think about it as hallucinations in the model. Essentially, a model hallucinates/overfit when it infers incorrect hypothesis from a dataset. A lot has been written about overfitting singe the early days of machine learning so I won’t presume to have any clever ways to explain it. However, I would like to use this post to present three practical ways to think about overfitting in deep learning models.
One of the aspects that makes overfitting so challenging is that is hard to generalize across different deep learning techniques. Convolutional neural networks tend of develop overfitting patterns that are different from the ones observed recurrent neural networks which are different from generative models and that pattern can be extrapolated to any class of deep learning models. Somewhat ironically, the propensity to overfit has increase linearly with the computation capacity of deep learning models. As deep learning agents can generate complex hypothesis at virtually no cost, the propensity to overfit increases.
Three Simple Strategies to Fight Overfitting
While there are no silver bullets to prevent overfitting, practical experience have shown some simple, almost common sense, rules that help prevent this phenomenon in deep learning applications. From the dozens of best practices that have been published to prevent overfitting, there are three fundamental ideas that encompass most of them.
The Data /Hypothesis Ratio
Overfitting typically occurs when a model produces too many hypothesis without the corresponding data to validate them. As a result, deep learning applications should try to keep a decent ratio between the test datasets and the hypothesis that should be evaluated. However, this is not always an option.
There are many deep learning algorithms such as inductive learning that rely on constantly generating new and sometimes more complex hypothesis. In those scenarios, there are some statistical techniques that can help estimate the correct number of hypothesis needed to optimize the chances of finding one close to correct. While this approach does not provide an exact answer, it can help to maintain a statistically balanced ration between the number of hypotheses and the composition of the dataset. Harvard professor Leslie Valiant brilliantly explains this concept in his book Probably Approximately Correct.
Favoring Simpler Hypotheses
A conceptually trivial but technically difficult idea to prevent overfitting in deep learning models is to continuously generate simpler hypothesis. Of course! Simple is always better isn’t it? But what is a simpler hypothesis in the context of deep learning algorithms? If we need to reduce it to a quantitative factor, I would say that the number of attributes in an deep learning hypothesis is directly proportional to its complexity.
Simpler hypotheses tend to be easier to evaluate than others with large number of attributes both computationally and cognitively. As a result, simpler models are typically less prompt to overfit than complex ones. Great! now the next obvious headache is to figure out how to generate simpler hypothesis in deep learning models. A non-so-obvious technique is to attach some form of penalty to an algorithms based on its estimated complexity. That mechanism tends to favor simpler, approximately accurate hypothesis over more complex and sometimes more accurate ones that could fall apart when new datasets appear.
The Bias/Variance Balance
Bias and Variance are two key estimators in deep learning models. Conceptually, Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. A model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data. Alternatively, Variance refers to the variability of model prediction for a given data point or a value which tells us spread of our data. Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data.
How are bias and variance related to overfitting? In super simple terms, the art of generalization can be summarized by reducing the bias of a model without increasing its variance. A good practice in deep learning models it to regularly compare the produced hypothesis against test datasets and evaluate the results. If the hypothesis continue outputting the same mistakes, then we have a big bias issue and we need to tweak or replace the algorithm. If instead there is no clear pattern to the mistakes, the problem is variance and we need more data.
Most of the best practices related to preventing overfitting in deep learning systems can be summarized to a permutation of the aforementioned strategies. The simple nature of the previous arguments can help with designing effective strategies to prevent overfitting in deep learning programs without requiring sophisticated tools or frameworks.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot