Sampling, effects and power

Revisiting power

  • Statistical power is the long-run ability of a study design to detect a true effect.
  • In frequentist statistics, the true effect is fixed and we want to use a sample to estimate it.
  • Since we do not know the true effect, the goal of a power analysis is to reach sufficient power for an assumed effect (Lakens 2022).
  • This assumption can be based on a hypothesis of a true effect, or
  • An effect that is considered meaningful

A simulation experiment

We now will build a simulation study to investigate

  • How often will we find a “true” population effect in studies of different sizes?
  • What is the relationship between effect size, power and statistical significance?
  • What is the effect of sample size on the precision of estimates (confidence intervals)

Our simulation will investigate a known population effect of 0.5. This standardized effect (\(d\)) is in the one-sample case \(d = \frac{mean}{SD}\).

Code
library(tidyverse)
set.seed(1)


# Create a population to sample from with a known effect
# In standardized terms, the population effect is 0.5 = mean / sd.
population <- rnorm(10^6, 5, 10) 

# Calculate the population effect size 
pop.es <- mean(population) / sd(population)


results_total <- list()

# 0. Inside for-loop:
for(i in 1:1000) {
  
# 1. Sample from the population with sample size 10 to 100
  
  # 1.1 create a vector of sample sizes
  sample.sizes <- seq(from = 10, to = 100, by = 10)
  # 1.2 create a list to store results
  results_sub <- list()
  
  # 1.3 Inside a nested for-loop, perform sampling with each sample size
  for(j in 1:length(sample.sizes)) {
    
  samp <- sample(population, sample.sizes[j], 
                 replace = FALSE)
  
  
# 2. Create a model
  m <- lm(y ~ 1, data = data.frame(y = samp))
  
  # 2.1 Store results from each model as a data frame in a list:
 results_sub[[j]] <- data.frame(mean = coef(summary(m))[1, 1],  # Estimated mean
             se = coef(summary(m))[1, 2],  # Standard error
             pval = coef(summary(m))[1, 4],  # p-value
             ci.lwr = confint(m)[1], # confidence interval 
             ci.upr = confint(m)[2], # confidence interval
             effect.size = mean(samp) / sd(samp),
             sample.size = sample.sizes[j]) # Sample size
  
    
    
  }
  # 2.2 Combine all data frames from each sample size as a data frame in a list
  results_total[[i]] <- bind_rows(results_sub)

  
  # 3. Repeat 1 and 2 1000 times, performing 1000 studies each with 
  # sample sizes 10 to 100.
  
}

# Combine all results
results_total <- bind_rows(results_total)

What is the relationship between sample size and finding a true effect?

  • As the sample size increases, we are more likely to have a study that detects a true effect
  • We simply count the proportion of “studies” that declear a statistically significant effect at \(p<0.05\)
Code
# Count numbers of studies with p < 0.05
results_total %>%
  filter(pval < 0.05) %>%
  group_by(sample.size) %>%
  summarise(n = n(), 
            prop = n / 1000) %>%
  ggplot(aes(sample.size, 100 * prop)) + geom_line() + geom_point(shape = 21, fill = "lightblue", size = 3) + 
  labs(x = "Sample size", 
       y = "Percentage of studies rejecting the null-hypothesis", 
       title = "Statistical power and sample size", 
       subtitle = "Statistical significance set to 0.05, standardized effect-size 0.5") +
  theme_bw()

  • The relationship between power and sample size can be used to make cost-benefit analyses of future studies.
  • The “cost” of a study can be regarded as e.g. economical or ethical.
  • In most cases the power calculation can be done without simulations, using e.g. the pwr package.
Code
library(pwr)

sample.sizes <- seq(from = 10, to = 100, by = 10)
results.pwr <- list()

for(i in 1:length(sample.sizes)) {
  
  pwr_analysis <- pwr.t.test(type = "one.sample", d = 5/10, sig.level = 0.05, n = sample.sizes[i])
  

  results.pwr[[i]] <- data.frame(sample.size = sample.sizes[i], 
                                 prop = pwr_analysis$power)
  
  
  }

results.pwr <- bind_rows(results.pwr)

# Count numbers of studies with p < 0.05
results_total %>%
  filter(pval < 0.05) %>%
  group_by(sample.size) %>%
  summarise(n = n(), 
            prop = n / 1000) %>%
  ggplot(aes(sample.size, 100 * prop)) + 
  geom_line() + 
  geom_point(shape = 21, fill = "lightblue", size = 3) + 
  
  geom_line(data = results.pwr, color = "red") +
  
  labs(x = "Sample size", 
       y = "Percentage of studies rejecting the null-hypothesis", 
       title = "Statistical power and sample size", 
       subtitle = "Statistical significance set to 0.05, standardized effect-size 0.5.") +
  theme_bw()

Power analysis using simulations (blue circles) and analytically (red line)

Effect size and statistical significance

  • A standardized (observed) effect size may be calculated from each study. What is the relationship between effect-sizes, sample sizes and p-values?

  • The p-value is directly related to the observed effect-size. The sample size determines the p-value at a specific effect-size.

Code
library(cowplot)


plotA <- results_total %>%
  ggplot(aes(effect.size, pval, 
             color = as.factor(sample.size))) + geom_point() +
  labs(color = "Sample size") +
  
    geom_hline(yintercept = 0.05, color = "red") + 
  annotate("text", x = 1.5, y = 0.05 + 0.1, 
           color = "red",
           label = "P-value = 0.05", 
           size = 2.5) +
  
  geom_hline(yintercept = 0.001, color = "blue") + 
  annotate("text", x = 1.5, y = 0.001 + 0.1, 
           color = "blue",
           label = "P-value = 0.01", 
           size = 2.5) +
  
  labs(title = "Effect size vs. p-values", 
       x = "Effect size", 
       y = "P-value") + 

  theme_bw() + 
  theme(legend.position = "none")



plotB <- results_total %>%
  ggplot(aes(effect.size, -log(pval), 
             color = as.factor(sample.size))) + geom_point() +
  labs(color = "Sample size", 
       title = " ", 
       x = "Effect size", 
       y = "-log(p-value)") +
  
  geom_hline(yintercept = -log(0.05), color = "red") + 
  annotate("text", x = -0.2, y = -log(0.05) + 1, 
           color = "red",
           label = "P-value = 0.05", 
           size = 2.5) +
  
  geom_hline(yintercept = -log(0.001), color = "blue") + 
  annotate("text", x = -0.2, y = -log(0.001) + 1, 
           color = "blue",
           label = "P-value = 0.01", 
           size = 2.5) +
  scale_x_continuous(breaks = c(0, 0.5, 1, 2)) +
  
  
  theme_bw()

library(cowplot)
plot_grid(plotA, plotB, ncol = 2, rel_widths = c(0.8, 1))

  • The observed effect size is an estimation of the population effect size. In our case, the population effect-size is \(\frac{mean}{SD} = \frac{5}{10} = 0.5\).
  • How well do we estimate effect sizes?
Code
library(ggtext)

results_total %>%
  ggplot(aes(sample.size, effect.size, color = pval)) + 
  
  geom_point(position = position_jitter(), alpha = 0.5) +
    geom_hline(yintercept = pop.es, lty = 1, size = 2, color = "red") +
  labs(x = "Sample size", 
       y = "Observed standardized effect-size", 
       color = "P-value", 
           title = "Observed effect-sizes per sample size") +
  annotate("richtext", x = 150, y = pop.es, 
           color = "red", 
           hjust = 1,
           label = "Population effect-size") +
  theme_bw()

Effect of sample size on precision

  • Precision could be regarded as the ability to estimate an effect with some degree of certainty.
  • The confidence interval (CI) is constructed to find the true population effect at a given rate (e.g. 95% of studies).
  • CI depends on the sample size:

\[95\%~ \text{CI: Estimate} \pm \operatorname{t}_{\text{critical}} \times \operatorname{SE}\] For n = 10:

\[95\%~ \text{CI: Estimate} \pm 2.26 \times \operatorname{SE}\]

Code
results_total %>%
  ggplot(aes(sample.size, ci.upr-ci.lwr, fill = as.factor(sample.size))) +
  geom_point(position = position_jitter(width = 0.8),
             shape = 21, alpha = 0.4) +
  labs(x = "Sample size", 
       y = "Range of confidence intervals (Upper - Lower)", 
       title = "Confidence interval width and sample size") + 
  theme_bw() +
  theme(legend.position = "none")

  • To decrease the width of the confidence interval by half, we need to increase the sample size about 3-4 times.
Code
library(knitr)

results_total %>% 
  mutate(ci_width = ci.upr-ci.lwr) %>%
  group_by(sample.size) %>%
  summarise(ci_width = mean(ci_width)) %>%
  kable(col.names = c("Sample size", "95% CI width"), 
        digits = c(1, 1))
Sample size 95% CI width
10 14.0
20 9.3
30 7.4
40 6.4
50 5.7
60 5.1
70 4.7
80 4.4
90 4.2
100 4.0
  • The confidence interval covers the true population parameter in 95% of repeated studies. 1000 studies is not enough to reach exactly 95%.
Code
#| echo: true
#| message: false
#| warning: false



results_total %>%
  mutate(sig = if_else(ci.lwr <= 5 & ci.upr >= 5, "sig", "ns"), 
         interval = paste0(rep(seq(1:1000), each = 10))) %>%
  filter(sig == "sig") %>%
  group_by(sample.size) %>%
  summarise(prop = 100 * (n() / 1000)) %>%
    kable(col.names = c("Sample size", "Proportion of 95% CI finding the true effect"), 
        digits = c(1, 1))
Sample size Proportion of 95% CI finding the true effect
10 95.1
20 95.1
30 95.1
40 95.1
50 95.2
60 94.1
70 94.5
80 94.8
90 94.0
100 94.6
  • We will be wrong at the same rate with a given coverage of the confidence interval independent of the sample size.
  • However, we could increase the coverage of the interval and maintain precision with a larger sample size
Code
#| echo: true
#| message: false
#| warning: false


results_total %>%
  mutate(sig = if_else(ci.lwr <= 5 & ci.upr >= 5, "sig", "ns"), 
         interval = paste0(rep(seq(1:1000), each = 10))) %>%
  filter(interval %in% seq(1:100)) %>%
  
  ggplot(aes(mean, interval, color = sig, alpha = sig)) + 
  geom_errorbarh(aes(xmin = ci.lwr, 
                    xmax = ci.upr)) + 
  
  scale_alpha_manual(values = c(1, 0.3)) +
  
  geom_vline(xintercept = 5, color = "red") +
  
  facet_wrap(~ sample.size) + 
  
  labs(title = "Confidence intervals from 100 studies", 
       subtitle = "Red intervals misses the population average") +
  theme_bw() +
  
  theme(axis.text.y = element_blank(), 
        axis.ticks.y = element_blank(), 
        legend.position = "none", 
        axis.title = element_blank())

What if there is no effect?

  • Studies that examine a population effect that is close to zero will be wrong at rate of 5%, if the \(\alpha\) level is set to 0.05.
  • The “power” of such studies is 5% regardless of the sample size.
Code
#| echo: true
#| message: false
#| warning: false



library(tidyverse)
set.seed(1)


# Create a population to sample from with a known effect
# In standardized terms, the population effect is 0 = mean / sd.
population <- rnorm(10^6, 0, 10) 

# Calculate the population effect size 
pop.es <- mean(population) / sd(population)


results_total_null <- list()


# 0. Inside for-loop:
for(i in 1:1000) {
  
# 1. Sample from the population with sample size 10 to 100
  
  # 1.1 create a vector of sample sizes
  sample.sizes <- seq(from = 10, to = 100, by = 10)
  # 1.2 create a list to store results
  results_sub <- list()
  
  # 1.3 Inside a nested for-loop, perform sampling with each sample size
  for(j in 1:length(sample.sizes)) {
    
  samp <- sample(population, sample.sizes[j], 
                 replace = FALSE)
  
  
# 2. Create a model
  
  m <- lm(y ~ 1, data = data.frame(y = samp))
  
  # 2.1 Store results from each model as a data frame in a list:
  
 results_sub[[j]] <- data.frame(mean = coef(summary(m))[1, 1],  # Estimated mean
             se = coef(summary(m))[1, 2],  # Standard error
             pval = coef(summary(m))[1, 4],  # p-value
             ci.lwr = confint(m)[1], # confidence interval 
             ci.upr = confint(m)[2], # confidence interval
             effect.size = mean(samp) / sd(samp),
             sample.size = sample.sizes[j]) # Sample size
  
    
    
  }
  # 2.2 Combine all data frames from each sample size as a data frame in a list

  results_total_null[[i]] <- bind_rows(results_sub)

  
  # 3. Repeat 1 and 2 1000 times, performing 1000 studies each with 
  # sample sizes 10 to 100.
  
  # If we use a "progress bar"
  # setTxtProgressBar(pb, i)
  
}

# Close the progress bar
# close(pb)

# Combine all results
results_total_null <- bind_rows(results_total_null)
Code
# Count numbers of studies with p < 0.05
results_total_null %>%
  filter(pval < 0.05) %>%
  group_by(sample.size) %>%
  summarise(n = n(), 
            prop = n / 1000) %>%
  ggplot(aes(sample.size, 100 * prop)) + 
  geom_hline(yintercept = 5, lty = 2, color = "grey50") +
  geom_line() + 
  geom_point(shape = 21, fill = "lightblue", size = 3) + 
  labs(x = "Sample size", 
       y = "Percentage of studies rejecting the null-hypothesis", 
       title = "Statistical power and sample size", 
       subtitle = "Statistical significance set to 0.05, standardized effect-size 0") +
  scale_y_continuous(limits = c(0, 100)) +
  theme_bw()

  • P-values in studies examining population zero-effects will be uniformly distributed
Code
results_total_null %>%
  ggplot(aes(pval)) + 
  geom_histogram(aes(y=100 * ..count../1000), 
                 binwidth = 0.05, boundary = 0, color = "gray30", fill = "white") + 
  labs(x = "P-value", 
       y = "Percentage of studies", 
       title = "Distribution of p-values from studies of a population zero-effect") +
  scale_y_continuous(limits = c(0, 10), expand = c(0, 0)) +
  facet_wrap(~ sample.size) +
  theme_bw()

  • Be aware of p = 0.047, this p-value can be more probable when sampling from the zero-effect population in certain cases, an example:
Code
results_total %>%
  filter(sample.size == 50, 
         pval <= 0.05) %>%
    ggplot(aes(pval)) + 
  geom_histogram(aes(y=100 * ..count../1000), 
                 binwidth = 0.005, boundary = 0, color = "gray30", fill = "darkolivegreen4") +
  
  
    geom_histogram(data = results_total_null %>%
                          filter(sample.size == 50, 
                                  pval <= 0.05), 
  aes(y=100 * ..count../1000), 
                 binwidth = 0.005, boundary = 0, color = "gray30", fill = "darkblue") +
  
    labs(x = "P-value", 
       y = "Percentage of studies", 
       title = "Distribution of p-values from studies of true zero-effect and true d = 0.5", 
       subtitle = "Sample size: n = 50") +
  annotate("text", x = 0.03, y = 5, label = "Population effect: 0", color = "darkblue") +
  annotate("text", x = 0.03, y = 6, label = "Population effect: 0.5", color = "darkolivegreen4") +
  coord_cartesian(ylim=c(0,10)) +
  theme_bw()

References

Lakens, Daniël. 2022. “Sample Size Justification.” Collabra: Psychology 8 (1): 33267. https://doi.org/10.1525/collabra.33267.