Lecture 8: Causes and models

Models

We can use statistical models to estimate associations. Examples:
- A grouping variable is associated with the treatment effect in a Randomized Controlled Trial.
- Age is associated with disease in a cross-sectional study
- Sex is associated with different occupations in panel data.

Models and causality

Causality is often assumed in our in our statistical analyses
We assume that a dependent variable is caused by the independent variable.

“For every unit increase in the independent variable we observe an increase of y-units in the dependent variable”

Models and causes, and model of causes

However, to make causal claims about an association we need to connect our statistical model to a scientific model.
The scientific model is constructed based on knowledge that is not in the data.

A graphical model of cause

Directed Acyclic Graphs (DAGs) can be used to express a causal model.
The model is
- Directed, effects flows in one direction
- Acyclic, effects cannot travel backwards
- Contains no information on magnitudes or functional forms

Two variables with assumed causality

The treatment/exposure/independent variable causes the outcome/dependent variable.

Causes, models and study design

The Randomized controlled trial (RCT) isolates the cause by evenly distributing other “effects” on the outcome through randomization

Causes and confounders

A confounding variable affects the estimate of the causal effect in a RCT.

Confounders in simulation

library(broom)
library(gt)

set.seed(1)   # Setting the seed
n <- 50       # Number of observations
c <- rnorm(n) # Simulating the counfounder

x <- c + rnorm(n) # The confounder "cause" x
y <- c + rnorm(n) # The confounder "cause" y

m <- lm(y ~ x) # The model

tidy(m) %>%
        gt() %>%
        fmt_auto()

term	estimate	std.error	statistic	p.value
(Intercept)	−0.147	0.162	−0.908	0.368
x	0.436	0.129	3.388	0.001

Statistical control

We can account for a confounder in a statistical model by including it as a covariate
We ask: what is the relationship between x and y within every level of C?

term	estimate	std.error	statistic	p.value
(Intercept)	−0.156	0.132	−1.182	0.243
x	0.004	0.136	0.031	0.975
c	1.024	0.204	5.021	7.847 × 10⁻⁶

Should we control for everything?

In a regression model, a “control variable” can isolate the effect of treatment on outcome. However, this works only when the control variable is an actual confounder!

Another simulation

set.seed(1)   # Setting the seed
n <- 50       # Number of observations

x <- rnorm(n) # Nothing else is causing x
y <- rnorm(n) # Nothing else is causing y

c <- x + y + rnorm(n) # The collider is caused by y and x



m1 <- lm(y ~ x) # The model without collider
m2 <- lm(y ~ x + c)


tidy(m1) %>%
        gt() %>%
        fmt_auto() %>%
        tab_caption("Model estimates without collider")

Model estimates without collider
term	estimate	std.error	statistic	p.value
(Intercept)	0.122	0.139	0.875	0.386
x	−0.046	0.168	−0.271	0.788

tidy(m2) %>%
        gt() %>%
        fmt_auto() %>%
        tab_caption("Model estimates with collider")

Model estimates with collider
term	estimate	std.error	statistic	p.value
(Intercept)	0.14	0.096	1.462	0.15
x	−0.573	0.136	−4.227	1.081 × 10⁻⁴
c	0.537	0.072	7.41	1.955 × 10⁻⁹

A collider can be part of the “design” - Selection bias

Post-treatment bias

In a scenario were we are interested in the total effect of a treatment on an outcome we might have…

Post-treatment bias

Controlling for a mediator removes some (or all) of the effect of the treatment on the outcome.
This is called post-treatment bias

Statistical control

Using a statistical control without causal knowledge can bias your estimates!

Workshop

In this workshop we will try to simulate (with plausible estimates):

A confounder
A mediator
Selection bias
A collider