Fisher | Neyman-Pearson |
---|---|
1. State \(H_0\) | 1. State \(H_0\) and \(H_1\) |
2. Specify test statistic | 2. Specify \(\alpha\) (e.g. 5%) and \(\beta\) |
3. Collect data, calculate test statistic and \(p\)-value | 3. Specify test statistics and critical value |
4. Reject \(H_0\) if \(p\) is small | 4. Collect data, calculate test statistic, determine \(p\) |
5. Reject \(H_0\) in favour of \(H_1\) if \(p < \alpha\) |
State of the world | ||
---|---|---|
Decision | \(H_0\) true | \(H_0\) false |
Accept \(H_0\) | Type II error | |
Reject \(H_0\) | Type I error |
The \(p\)-value is the probability of obtaining a value of a test statistic (t) as extreme as the one obtained or more extreme under the condition that the null-hypothesis is true (sometimes written as \(p(t|H_0)\))
We assume that the null is true and we calculate how often a result such as the one obtained, or even more extreme, would occur as a result of chance.
The \(\alpha\)-level is the Type I error rate, the probability of rejecting \(H_0\) when it is actually true.
It is common practice to combine the two approaches in analysis of scientific experiments. Examples:
According to the original frameworks, the mix (Fisher combined with Neyman-Pearson) may lead to abuse of NHST
\[\alpha=0.05,~\beta=0.2,~d = 0.4\]
Statistical power is influenced by:
Large sample sizes can make small effect sizes statistically significant. Example, (Lee 2010):
Objective: To examine the association of different amounts of physical activity with long-term weight changes among women consuming a usual diet.
Design: Prospective cohort study, following 34,079 healthy, US women (mean age, 54.2 years) from 1992–2007. At baseline, 36-, 72-, 96-, 120-, 144- and 156-months’ follow-up, women reported their physical activity and body weight.
(Lee 2010)
(Lee 2010)
Journal of Physiology: For a given conclusion to be assessed, the exact p-values must be stated to three significant figures even when ‘no statistical significance’ is being reported. These should be stated in the main text, figures and their legends and tables. The only exception to this is if p is less than 0.0001, in which case ‘<’ is permitted. Trend statements are not permitted (i.e. ‘x increased, but was not significant’). Where there are many comparisons, a table of p values may be appropriate.
All groups regained weight after randomization by a mean of 5.5 kg in the self-directed, 5.2 kg in the interactive technology–based, and 4.0 kg in the personal-contact group… Those in the personal-contact group regained a mean of 1.2 kg less than those in the interactive technology–based group (95% CI, 2.1-0.3 kg; P=.008).
Questions adopted from (Dienes 2008).
Questions adopted from (Dienes 2008).
If you are interested in quantifying the probability of obtaining the population parameter, given your prior understanding → Bayesian statistics!
If you want to quantify the relative evidence in favour of a hypothesis over another based on your data → Likelihood inference!