Pre- to post-intervention designs

As we have already talked about, we have many potential models for assessing a pre- to post-intervention study.

In this workshop we will try to compare them.

A simulation

We plan to do a study comparing two training protocols where the outcome is muscle hypertrophy (CSA of the thigh muscle measured with MRI, cm²).
The study is fully randomized, i.e. we do not have any confounding factors.
To simulate the study, let’s build a simulation function

Code

library(tidyverse)

# n, number of participants, total sample size
# m, mean baseline measurement (cm^2)
# b0, between participant variation in baseline (SD)
# b1, between participant variation in change (SD)
# me, measurement error (error in estimating the true SD) 
# beta_A, beta_B, "fixed"/"population" effects in each group

simulate_study <- function(n = 80, 
                           m = 82, 
                           b0 = 12, 
                           b1 = 6, 
                           me = 4, 
                           beta_A = 10, 
                           beta_B = 15, 
                           seed = 1) {
        
        set.seed(seed)

        ## Simulate true pre values
        true_pre <- rnorm(n, m, b0)
        meassured_pre <- rnorm(n, true_pre, sd = me)

        ## Average effects in two groups
        A <- rep(beta_A, n/2)
        B <- rep(beta_B, n/2)

        ## Between participant variation in post-values
        b <- rnorm(n, 0, b1)

        ## Group-wise post-intervention values
        true_post <- true_pre + c(A, B) + b

        # Measured post
        meassured_post <- rnorm(n, true_post, sd = me)

        # Combine into data frame
        dat <- data.frame(id = 1:n, 
                          true_pre,            
                          meassured_pre, 
                          true_post, 
                          meassured_post)
        
}

The simulate_study function produces a data frame with “true” observations an “measured” observations

Questions (1):

To answer these questions, change parameters of the simulation and investigate the results

What is the effect of measurement error on the relationship between pre-intervention scores and change scores?
What are the estimates of change in each group (compare descriptive statistics with model-based estimates)?

To do:

Descriptive statistics
A t-test on change scores (post-pre)
A linear model on change scores lm(change ~ group)
An ANCOVA model accounting for baseline lm(change + pre + group) and lm(post ~ pre + group)
A varying-intercepts model using lme4 (lmer(csa ~ time * group + (1|id)))

Repeated measurements in each time-point

We have more resources to do a follow-up study, here we can measure twice pre-, and post-intervention.
To simulate this aspect of the study, let’s update the simulation function.

Code

# n, number of participants, total sample size
# m, mean baseline measurement (cm^2)
# b0, between participant variation in baseline (SD)
# b1, between participant variation in change (SD)
# me, measurement error (error in estimating the true SD) 
# beta_A, beta_B, "fixed"/"population" effects in each group
# seed, the starting point of simulations
simulate_study2 <- function(n = 80, 
                           m = 82, 
                           b0 = 12, 
                           b1 = 6, 
                           me = 4, 
                           beta_A = 10, 
                           beta_B = 15, 
                           seed = 1) {
        
        set.seed(seed)

        ## Simulate true pre values
        true_pre <- rnorm(n, m, b0)
        meassured_pre1 <- rnorm(n, true_pre, sd = me)
        meassured_pre2 <- rnorm(n, true_pre, sd = me)

        ## Average effects in two groups
        A <- rep(beta_A, n/2)
        B <- rep(beta_B, n/2)

        ## Between participant variation in post-values
        b <- rnorm(n, 0, b1)

        ## Group-wise post-intervention values
        true_post <- true_pre + c(A, B) + b

        # Measured post
        meassured_post1 <- rnorm(n, true_post, sd = me)
        meassured_post2 <- rnorm(n, true_post, sd = me)

        # Combine into data frame
        dat <- data.frame(id = 1:n, 
                          true_pre,            
                          meassured_pre1, 
                          meassured_pre2, 
                          true_post, 
                          meassured_post1, 
                          meassured_post2)
        
}

Questions (2):

To answer these questions, change parameters of the simulation and investigate the results

What is the effect of measurement error on the relationship between pre-intervention scores and change scores when we combine multiple measurements (average over participant and time-point)?
What are the estimates of change in each group (compare descriptive statistics with model-based estimates)?
Can we recover the variation between participants (b0 and b1) in a varying effects model?

To do:

Descriptive statistics
A t-test on change scores (post-pre, aggregated data)
A linear model on change scores lm(change ~ group) (aggregated data)
An ANCOVA model accounting for baseline lm(change + pre + group) and lm(post ~ pre + group) (aggregated data)
A varying-effects model using lme4 (lmer(csa ~ time * group + (1 + time|id)))

Which model do we prefere?

Using different study designs we have many alternatives for analysis, which one is best? We could use the simulation functions to investigate this. The strategy would be something like this:

for i in 1:number of simulations (

        - simulate data
        - Fit multiple models, save results
)

compare results between model

Questions (3):

How could we write up code that represents the above strategy?
Which model is the better one at capturing the true effects in the data?