How To Find Causal Inference When A/B Tests Are Unsuitable



Original Source Here

Data science is all the rage at the moment, and quite rightly. Econometrics, the brother of data science (statistics being the father) sometimes feels a bit left out. Despite a large majority of their techniques and procedures being the same, some really useful econometric techniques often get forgotten about in modern data science.

An example of some of my journal-published econometric work can be seen here.

Data science focuses mostly on prediction, trying to determine causal inference is often not considered important. Data science prediction techniques are fantastic as long as the underlying systems and relationships do not change, but they do frequently in the real world. Knowing causal relationships also helps drive policy and decision making.

Econometrics, despite having reputation for being incredibly boring (while somehow its brother gets called the sexiest job of the 21st century), is often quite an artistic science.

Because, by definition, econometrics studies the economy and human interactions within it, A/B tests are often hard to come by as they are ethically challenging. We can’t just start playing with peoples lives on a macro scale or people might get a bit miffed. Imagine we just started randomly allocating university places. Therefore, econometrics has come up with some backdoor approaches which can have some great uses.

In this article, I’ll talk through one of the most interesting, yet simple, econometric techniques — regression discontinuity design. I’ll use an example from Carpenter & Dobson (2011) which uses the technique to analyse the legal drinking age and the effects it would have on the economy if reduced.

As a Scotsman, that’s something close to my heart.

I’ll also code it up in Python.

Regression discontinuity design

In econometrics (as well as statistics, epidemiology and statistics) regression discontinuity design is a quasi-experimental technique that attempts to elicit causal inference from data where an A/B test is not appropriate. This is often done in hindsight using already existing data. It is done by assigning a threshold above and below an ‘intervention point’. However, it remains impossible to claim explicit causal inference with this method, as it does not automatically reject causal effects by any potential latent or unconsidered variables. However, studies have shown that RCTs (A/B tests) and RDDs don’t produce too dissimilar results (within 0.07 standard deviations after appropriate adjustments are made).

The most commonly cited example is one of scholarships and GPAs.

An example of the average treatment effect we are looking for. Source: https://www.rpubs.com/muntasir_masum/rdd

Consider a high school where the top x percentage of students (sorted by GPA) receive a merit scholarship and we want to know the effect the scholarship had on these individuals. We cannot simply compare the individuals who received the scholarships and those who didn’t as there is a clear selection bias. The better individuals are more likely to get the scholarship and are therefore more likely to excel in the future (it’s not wild to assume that future success is predicted by GPA).

The scholarship might not have had any effect at all, just the students who received it were already set to do better. So in the graph above, take C to be the scholarship cutoff point and everyone above it gets a scholarship and those below it, do not.

What RDD does, is take the points (or students) that are around the cutoff point and compare them. For example, the students that are anywhere in the range between 78–82%. Considering the random nature of marking exams and other individual random variables, this group are considering to be somewhat equal in underlying ability. Comparing the individuals within this group, with some receiving the treatment (the scholarship) and some not, we can deduce the effect the scholarship had on individuals and therefore estimate causal inference (as mentioned this is not true causal inference but is instead coined the average treatment effect).

Should we lower the minimum drinking age?

In their paper, Carpenter & Dobson (2011) use this exact technique to analyse what the effects would be if, in the USA, the drinking age was to be lowered from 21. Considering we start drinking whisky about eight years old in Scotland, I’m all in favour of slashing it so our brothers and sisters across the pond can enjoy the same fruits of life as us (you might not want to take our life expectancy though). Let’s see if the data agrees.

RDD is a great tool to use here as there is already a specific cut off in place — one’s 21st birthday. An individual is effectively the same person the day before and the day after their 21st Birthday apart from the fact they can now enjoy life. In more scientific terms, and the words of the original authors; “ If nothing other than the legal regime changes discretely at age 21, then discrete mortality rates at age 21 can plausibly be attributed to the drinking age”.

The authors compared three different types of death; death by motor vehicle, death by internal causes and death by suicide.

As can be seen, there is a clear discontinuity in the lines of best fit in the first and last groups.

Source: The Minimum Legal Drinking Age and Public Health (nih.gov)

To estimate the discontinuity and to test whether it is statistically significant they use the following equation:

y = β0 + β1MLDA + β2Birthday + f(age) + ε

“y” is the age-specific mortality rate, with a data point for each year and month.

“MLDA” is a dummy variable that takes on a value of 1 for observations 21 and older, and 0 otherwise.

“f(age)” is a quadratic polynomial which is fully interacted with the “MLDA” dummy.

“Birthday” is a dummy variable for the month in which the decedent’s 21st birthday falls and is intended to absorb the pronounced effect of birthday celebrations on mortality rates.

“ε” is of course the unobserved error term as always.

As a result, the parameter of interest in this model is β1.

The authors then run a linear regression with the above equation and present the following table of results.

Source: The Minimum Legal Drinking Age and Public Health (nih.gov)

The results in the table above are consistent with the graphical evidence and show a statistically significant 8.7 percent increase in overall mortality when people turn 21 (8.06 additional deaths per 100,000 person-years from a base of 93.07 deaths corresponds to 8.06/93.07 = 0.087, or an 8.7 percent increase).

Overall, the visual evidence in the above image and the corresponding regression estimates in the table provide strong evidence that the minimum legal drinking age has a significant effect on mortality from suicides, motor vehicle accidents, and alcohol overdoses at age 21.

Damn, looks like it’s a good idea to not let kids drink.

How to do it yourself

I’ll briefly now show you how to run a simplified version of this experiment for yourself in Python. It’s extremely simple. So firstly I created some fake data by reverse-engineering the equation the authors used for their regression. The scales are a bit different but that doesn’t matter. This is what Scotland’s numbers might look like this year after finally qualifying for the European football championships after 23 years of hurt…

A scatter plot of fictional data created by author
The head of the dataframe used for this analysis created by Author

I used the same 48 month period and also centralized the age (it will be normalized in the regression anyway).

We can then simply run a linear regression as we normally do! We can use “statsmodels” to get a more ‘econometric’ style output. Sklearn produces the R-Squared but doesn’t produce coefficient P-values which we would like to know in this case.

link to my github — code by author.

The output is therefore below:

Output from regression run by author.

And not surprisingly, the regression has predicted almost the exact numbers I used when I made the dataset (5 for Age, and 50 for turning 21). I did add a random error term but apparently not enough of one because the R-Squared is still very high at 0.98.

As can be seen the P-values for both age and turning 21 are both strongly significant so we can reject the null hypothesis that turning 21 does not affect the mortality rate (of course we can as I made the data this way).

Therefore, as you can see it’s effectively just the same as running a normal linear regression with extra care being taken on specifying the equation and controlling for relationships. The authors of the original paper go into a lot more effort to control for all other age variables but this is just a simplified example of how to do it.

Takeaways

If you can do an A/B test, then do it. But if you can’t, then an RDD is your best bet. If you correctly specify the equations and don’t break any of the assumptions, you shouldn’t have any problems. Be careful of pitfalls like the dummy-variable trap or too much multicollinearity.

As mentioned, this method does not deduce true causal inference. For example, we don’t know for sure all the variables that play into this equation. The prolonged waiting of American teens to touch the booze might play a part in deaths soon after turning 21. So be careful when making bold statements.

And finally, to sum up, the findings of Carpenter & Dobson (2011) state that drinks would have to be charged at 15 dollars to make up the Economic costs of lowering the drinking age to 18 (using approximately 8 million dollars as the statistical value of life).

I for one am not in favour of paying $15 a drink (even though I pay not far off that in London now anyway).

So apologies young Americans, you’re going to have to wait until you’re 21, it’s not good for you anyway.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: