We plan to do a study comparing two training protocols where the outcome is muscle hypertrophy (CSA of the thigh muscle measured with MRI, cm2).
The study is fully randomized, i.e. we do not have any confounding factors.
To simulate the study, let’s build a simulation function
Code
library(tidyverse)# n, number of participants, total sample size# m, mean baseline measurement (cm^2)# b0, between participant variation in baseline (SD)# b1, between participant variation in change (SD)# me, measurement error (error in estimating the true SD) # beta_A, beta_B, "fixed"/"population" effects in each groupsimulate_study <-function(n =80, m =82, b0 =12, b1 =6, me =4, beta_A =10, beta_B =15, seed =1) {set.seed(seed)## Simulate true pre values true_pre <-rnorm(n, m, b0) meassured_pre <-rnorm(n, true_pre, sd = me)## Average effects in two groups A <-rep(beta_A, n/2) B <-rep(beta_B, n/2)## Between participant variation in post-values b <-rnorm(n, 0, b1)## Group-wise post-intervention values true_post <- true_pre +c(A, B) + b# Measured post meassured_post <-rnorm(n, true_post, sd = me)# Combine into data frame dat <-data.frame(id =1:n, true_pre, meassured_pre, true_post, meassured_post)}
The simulate_study function produces a data frame with “true” observations an “measured” observations
Questions (1):
To answer these questions, change parameters of the simulation and investigate the results
What is the effect of measurement error on the relationship between pre-intervention scores and change scores?
What are the estimates of change in each group (compare descriptive statistics with model-based estimates)?
To do:
Descriptive statistics
A t-test on change scores (post-pre)
A linear model on change scores lm(change ~ group)
An ANCOVA model accounting for baseline lm(change + pre + group) and lm(post ~ pre + group)
A varying-intercepts model using lme4 (lmer(csa ~ time * group + (1|id)))
Repeated measurements in each time-point
We have more resources to do a follow-up study, here we can measure twice pre-, and post-intervention.
To simulate this aspect of the study, let’s update the simulation function.
Code
# n, number of participants, total sample size# m, mean baseline measurement (cm^2)# b0, between participant variation in baseline (SD)# b1, between participant variation in change (SD)# me, measurement error (error in estimating the true SD) # beta_A, beta_B, "fixed"/"population" effects in each group# seed, the starting point of simulationssimulate_study2 <-function(n =80, m =82, b0 =12, b1 =6, me =4, beta_A =10, beta_B =15, seed =1) {set.seed(seed)## Simulate true pre values true_pre <-rnorm(n, m, b0) meassured_pre1 <-rnorm(n, true_pre, sd = me) meassured_pre2 <-rnorm(n, true_pre, sd = me)## Average effects in two groups A <-rep(beta_A, n/2) B <-rep(beta_B, n/2)## Between participant variation in post-values b <-rnorm(n, 0, b1)## Group-wise post-intervention values true_post <- true_pre +c(A, B) + b# Measured post meassured_post1 <-rnorm(n, true_post, sd = me) meassured_post2 <-rnorm(n, true_post, sd = me)# Combine into data frame dat <-data.frame(id =1:n, true_pre, meassured_pre1, meassured_pre2, true_post, meassured_post1, meassured_post2)}
Questions (2):
To answer these questions, change parameters of the simulation and investigate the results
What is the effect of measurement error on the relationship between pre-intervention scores and change scores when we combine multiple measurements (average over participant and time-point)?
What are the estimates of change in each group (compare descriptive statistics with model-based estimates)?
Can we recover the variation between participants (b0 and b1) in a varying effects model?
To do:
Descriptive statistics
A t-test on change scores (post-pre, aggregated data)
A linear model on change scores lm(change ~ group) (aggregated data)
An ANCOVA model accounting for baseline lm(change + pre + group) and lm(post ~ pre + group) (aggregated data)
A varying-effects model using lme4 (lmer(csa ~ time * group + (1 + time|id)))
Which model do we prefere?
Using different study designs we have many alternatives for analysis, which one is best? We could use the simulation functions to investigate this. The strategy would be something like this:
for i in1:number of simulations (- simulate data- Fit multiple models, save results)compare results between model
Questions (3):
How could we write up code that represents the above strategy?
Which model is the better one at capturing the true effects in the data?