# 4 Python Packages to Learn Causal Analysis

https://miro.medium.com/max/1200/0*GUGbYP-WL4DVEj-O

Original Source Here

# 4 Python Packages to Learn Causal Analysis

## Learn cause and effect analysis with these packages

Causal Analysis is a field within experimental statistics to prove and establish the cause and effect relationship. In statistics, using statistical algorithms to infer causality within the dataset under the strict assumption is called Exploratory causal analysis (ECA).

ECA, in turn, is a way to prove causation with more controllable experimentations and not only based on the correlation. We often need to prove the Counterfactual — A different condition under other circumstances. The problem is we only could approximate the Causal Effect and not the counterfactual.

Causal Analysis is already a different field of learning in data science because it is inherently different from the prediction from Machine Learning modeling. We could predict the ML result from the existing data but never what came outside of the existing data.

# 1. Causalinference

Causalinference is a Python package that provides various statistical methods for causal analysis. It is a simple package that was used for basic causal analysis learning. The main features of these packages include:

• Propensity score estimation and subclassification
• Improvement of covariate balance through trimming
• Estimation of treatment effects
• Assessment of overlap in covariate distributions

We can find the explanation on their web page for a longer explanation regarding each term.

Let’s try out the Causalinference package. For starters, we need to install the package.

`pip install causalinference`

After the installation finishes, we will try to implement a causal model for causal analysis. We would use the random data that came from the causalinference package.

`from causalinference import CausalModelfrom causalinference.utils import random_data#Y is the outcome, D is treatment status, and X is the independent variableY, D, X = random_data()causal = CausalModel(Y, D, X)`

The CausalModel class would analyze the data. We would need to do a few more steps to acquire important information from the model. First, let’s get the statistical summary.

`print(causal.summary_stats)`

By using the `summary_stats` attribute, we would acquire all the basic information of the dataset.

The main part of causal analysis is acquiring the treatment effect information. The simplest one to do is by using the Ordinary Least Square method.

`causal.est_via_ols()print(causal.estimates)`

ATE, ATC, and ATT stand for Average Treatment Effect, Average Treatment Effect for Control and Average Treatment Effect for Treated, respectively. Using this information, we could assess whether the treatment has an effect compared to the control.

Using the propensity score method, we could also get information regarding the probability of treatment conditional on the independent variables.

`causal.est_propensity_s()print(causal.propensity)`

Using the propensity score method, we could assess the probability of the treatment given the independent variables.

There are still many methods you could explore and learn from. I suggest you visit the causalinference web page and learn further.

# 2. Causallib

Causallib is a Python package for Causal Analysis developed by IBM. The package provides a causal analysis API unified with the Scikit-Learn API, which allows a complex learning model with the fit-and-predict method.

What is good with the Causallib package is the number of example notebooks we could use for our learning process.

Then, let’s try to use the causallib package for our learning. First, we need to install the package.

`pip install causallib`

After that, we would use an example dataset from the causallib package and estimate the causal analysis using the model from Scikit-Learn.

`from sklearn.linear_model import LogisticRegressionfrom causallib.estimation import IPW from causallib.datasets import load_nhefsdata = load_nhefs()ipw = IPW(LogisticRegression())ipw.fit(data.X, data.a)potential_outcomes = ipw.estimate_population_outcome(data.X, data.a, data.y)effect = ipw.estimate_effect(potential_outcomes[1], potential_outcomes[0])`

The above code would load a follow-up study regarding the effect of smoking on health. We used the Logistic Regression model as a Causal Model to establish and assess the causal effect.

Let’s check what happens to the treatment’s potential outcome and effect.

`print(potential_outcomes)`

Checking the potential outcomes, we can see that the average difference in weight if everyone had quit smoking (1) is 5.38 kg, while the average weight difference if everyone has been smoking continuously (0) is 1.71kg.

This means we have average weight differences of around 3.67 kg. So we could conclude that the smoking treatment would decrease weight gain by around 3.67 kg.

# 3. Causalimpact

Causalimpact is a Python package for Causal Analysis to estimate the causal effect of the time series intervention. The analysis tries to see the difference between the treatment before and after the fact.

Causalimpact would analyze the response time series (e.g., clicks, drug effect, etc.) and a control time series (your response but in a more controlled environment) with the Bayesian structural time-series model. This model predicts the counterfactual (what happens if the intervention never happens), and then we could compare the result.

Let’s start to use the package by installing it.

`pip install causalimpact`

After finishing installing the package, let’s create simulated data. We would create an example dataset with 100 observations where there would be an intervention effect after timepoint 71.

`import numpy as npfrom statsmodels.tsa.arima_process import arma_generate_samplefrom causalimpact import CausalImpactnp.random.seed(1)x1 = arma_generate_sample(ar=[0.999], ma=[0.9], nsample=100) + 100y = 1.2 * x1 + np.random.randn(100)y[71:100] = y[71:100] + 10data = pd.DataFrame(np.array([y, x1]).T, columns=["y","x1"])pre_period = [0,69]post_period = [71,99]`

Above, we acquire a dependent variable (y) and an independent variable (x1). Usually, we would have more than one independent, but let’s stick with the current data. Let’s run the analysis with this data. We need to specify the period before there is an intervention and after.

`impact = CausalImpact(data, pre_period, post_period)impact.run()impact.plot()`

The plot above gives us three sets of information. The top panel shows the actual data and a counterfactual prediction for the post-treatment period. The middle panel shows the difference between actual data and counterfactual predictions, which is the pointwise causal effect. The bottom panel is a plot of the cumulative effect of the intervention, where we accumulate the pointwise contributions from the middle panel.

If we want to gain information from each data point, we could use the following code.

`impact.inferences`

Also, a summary result is acquired via the following code.

`impact.summary()`

The summary allowed us to assess if the intervention happening had a causal effect or not. If you want a more detailed report, you could use the following code.

`impact.summary(output = 'report')`

# 4. DoWhy

DoWhy is a Python package that provides state-of-art causal analysis with a simple API and complete documentation.

If we visit the documentation Page, DoWhy did the causal analysis via 4-steps:

1. Model a causal inference problem using assumptions we create,
2. Identify an expression for the causal effect under the assumption,
3. Estimate the expression using statistical methods,
4. Verify the validity of the estimate.

Let’s try to initiate a causal analysis with the DoWhy package. First, we must install the DoWhy package by running the following code.

`pip install dowhy`

After that, as a sample dataset, we would use the randomized dataset from the DoWhy package.

`from dowhy import CausalModelimport dowhy.datasets# Load some sample datadata = dowhy.datasets.linear_dataset(    beta=10,    num_common_causes=5,    num_instruments=2,    num_samples=10000,    treatment_is_binary=True)`

First, given a graph and assumption we create, we could develop it into the causal model.

`Create a causal model from the data and given graph.model = CausalModel(    data=data["df"],    treatment=data["treatment_name"],    outcome=data["outcome_name"],    graph=data["gml_graph"])model.view_model()`

Next, we need to identify the causal effect with the following code.

`#Identify the causal effectestimands = model.identify_effect()`

We identify a causal effect, and then we need to estimate how strong the effect is statistically.

`estimate = model.estimate_effect(identified_estimand,                              method_name="backdoor.propensity_score_matching")`

Lastly, The causal effect estimation is based on the data’s statistical estimation, but the causality itself is not based on the data; rather, it is based on our assumptions previously. We need to check the assumption validity with the robustness check.

`refute_results = model.refute_estimate(identified_estimand, estimate,                                     method_name="random_common_cause")`

With that, we completed the causal analysis and could use the information to decide whether there is a causal effect from the treatment or not.

DoWhy documentation offers vast learning material; you should visit the web page to learn further.

# Conclusion

Causal Analysis is a field within experimental statistics to prove and establish the cause and effect relationship. It is a different field in data science and needs its learning material.

In this article, I have outlined 4 Python packages you could use for Causal Analysis learning. They are:

1. Causalinference
2. Causallib
3. Causalimpact
4. DoWhy

I hope it helps!

Visit me on my Social Media to have a more in-depth conversation or any questions.

If you are not subscribed as a Medium Member, please consider subscribing through my referral.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot