Don’t forget about the “scientist” in your Data Scientist title (Part 2)

Original Source Here

Don’t forget about the “scientist” in your Data Scientist title (Part 2)

Testing multiple factors at the same time without introducing confounding variables

If you missed part 1, head over there for a full introduction to experimental design.

As a quick refresher, experimental design is a technique used to allow researchers to evaluate the impact on a system by a number of different factors [1]. We discussed how to avoid confounders and went through some examples, but we didn’t really talk about how to test more than one factor at the same time.

The beauty of factorial design is that it is very easy to extend.

Let’s build off the example of a hypothetical deep learning model from the previous post. The specifics of what is contained in each of the levels shouldn’t matter, so knowing the deep learning concepts shouldn’t be necessary to being able to follow along.

To evaluate what would happen if the number of neurons in the model were increased, it’s as simple as adding a factor to the experiment. Let’s look at what happens if the number of neurons is increased to either 32 or 64 (i.e., 3 levels: level 1 is the baseline which is 16, level 2 which is 32, and level 3 which is 64). To fully measure the impact of this change, create additional experiments that test against these different hyperparameter configurations. Below is an overview of how this looks, with each box representing an experiment.

Going from a 2×1 design to a 2×3 design results in adding an additional four test scenarios. It seems pretty simple displayed visually but in addition to testing what happens if the number of layers is increased, additional scenarios are created to test the different number of neuron scenarios. The important point here that we need to test every level of a factor against every level of the other factor. When an experiment is outlined as a 2×3 factor design it’s pretty easy to do the math and see that six scenarios need to be tested to fully evaluate each option. The ultimate result is that by using a metric to benchmark performance, it’s easy to evaluate which of the six scenarios performed the best.

That was too easy so why stop there? Let’s go crazy and add another factor.

To really get an idea of how extensible factorial design (this is a name for the specific type of experimental design we’re discussing here) is, let’s play out a scenario where we want to also test different optimizers. Initially, ADAM was the baseline optimizer but let’s test the change to the performance metric if SGD is used instead. To do this, the design goes from a 2×3 Design to a 2x3x2. This results in our previous design that had 6 scenarios moving one where 12 scenarios will be tested.

At the risk of being redundant, you can easily see that adding another factor with two levels, makes us evaluate each of the previous scenarios against all levels of the new factor. At this point, it should be pretty easy to see how extensible this framework is. However, it’s also apparent how adding one additional factor can add to the overall workload.