Lecture 8: Causes and models

Models

  • We can use statistical models to estimate associations. Examples:
    • A grouping variable is associated with the treatment effect in a Randomized Controlled Trial.
    • Age is associated with disease in a cross-sectional study
    • Sex is associated with different occupations in panel data.

Models and causality

  • Causality is often assumed in our in our statistical analyses
  • We assume that a dependent variable is caused by the independent variable.

“For every unit increase in the independent variable we observe an increase of y-units in the dependent variable”

Models and causes, and model of causes

  • However, to make causal claims about an association we need to connect our statistical model to a scientific model.
  • The scientific model is constructed based on knowledge that is not in the data.

A graphical model of cause

  • Directed Acyclic Graphs (DAGs) can be used to express a causal model.
  • The model is
    • Directed, effects flows in one direction
    • Acyclic, effects cannot travel backwards
    • Contains no information on magnitudes or functional forms

Two variables with assumed causality

  • The treatment/exposure/independent variable causes the outcome/dependent variable.

Causes, models and study design

  • The Randomized controlled trial (RCT) isolates the cause by evenly distributing other “effects” on the outcome through randomization

Causes and confounders

  • A confounding variable affects the estimate of the causal effect in a RCT.

Confounders in simulation

library(broom)
library(gt)

set.seed(1)   # Setting the seed
n <- 50       # Number of observations
c <- rnorm(n) # Simulating the counfounder

x <- c + rnorm(n) # The confounder "cause" x
y <- c + rnorm(n) # The confounder "cause" y

m <- lm(y ~ x) # The model

tidy(m) %>%
        gt() %>%
        fmt_auto()
term estimate std.error statistic p.value
(Intercept) −0.147 0.162 −0.908 0.368
x  0.436 0.129  3.388 0.001

Statistical control

  • We can account for a confounder in a statistical model by including it as a covariate
  • We ask: what is the relationship between x and y within every level of C?

term estimate std.error statistic p.value
(Intercept) −0.156 0.132 −1.182 0.243
x  0.004 0.136  0.031 0.975
c  1.024 0.204  5.021 7.847 × 10−6

Should we control for everything?

  • In a regression model, a “control variable” can isolate the effect of treatment on outcome. However, this works only when the control variable is an actual confounder!

Another simulation

set.seed(1)   # Setting the seed
n <- 50       # Number of observations

x <- rnorm(n) # Nothing else is causing x
y <- rnorm(n) # Nothing else is causing y

c <- x + y + rnorm(n) # The collider is caused by y and x



m1 <- lm(y ~ x) # The model without collider
m2 <- lm(y ~ x + c)


tidy(m1) %>%
        gt() %>%
        fmt_auto() %>%
        tab_caption("Model estimates without collider")
Model estimates without collider
term estimate std.error statistic p.value
(Intercept)  0.122 0.139  0.875 0.386
x −0.046 0.168 −0.271 0.788
tidy(m2) %>%
        gt() %>%
        fmt_auto() %>%
        tab_caption("Model estimates with collider")
Model estimates with collider
term estimate std.error statistic p.value
(Intercept)  0.14  0.096  1.462 0.15
x −0.573 0.136 −4.227 1.081 × 10−4
c  0.537 0.072  7.41  1.955 × 10−9

A collider can be part of the “design” - Selection bias

Post-treatment bias

  • In a scenario were we are interested in the total effect of a treatment on an outcome we might have…

Post-treatment bias

  • Controlling for a mediator removes some (or all) of the effect of the treatment on the outcome.
  • This is called post-treatment bias

Statistical control

  • Using a statistical control without causal knowledge can bias your estimates!

Workshop

In this workshop we will try to simulate (with plausible estimates):

  • A confounder
  • A mediator
  • Selection bias
  • A collider