NHST is a procedure for testing the Null Hypothesis.
It's a binary decision:
- Reject or Retain the Null Hypothesis
- Ie Retain the Null Hypothesis or the alternative hypothesis.
Before starting any experimentation (ie test), two hypothesis are set up:
- The Null hypothesis <math>H_0</math> . There is no relationship between X and Y (nothing is happening, no effects) For example:
- a regression. Slope = 0
- versus the alternative hypothesis <math>H_0</math> . There is some relationship between X and Y. For example:
- r > 0
- a regression. Slope > 0
The game of NHST is to start with the assumption that the Null hypothesis is true.
During an experimentation, we compare:
- an experimental group: the <math>H_0</math> : Null Hypothesis (No difference between the groups)
- and a control group: the <math>H_A</math> : Alternative Hypothesis (Statistically significant difference between the groups)
The “difference” is defined in terms of test statistic:
The p-value gives the probability that the difference is not only due to the chance.
- The new ad placement produces more click-throughs
- This treatment produces better outcomes
The test is known :
- as Directional (one tailed test) if the alternative hypothesis predicts the direction of the relationship between X & Y (positive vs. negative))
- Otherwise as non-directional (two tailed test)
Example: Set up for a Non-directional for regression:
- Ho. Slope = 0
- Ha. Slope <> 0
- Assume the null hypothesis is true (Slope = 0), then no one sample should have a very high of very low p-value (Ie p <0.5).
- Conduct a study
- Calculate the statistics:
- Standard Error of the regression coefficient (SE),
- If I obtained a very high B, reject the NULL hypothesis.
Four possible outcomes
Four boxes, four outcomes.
Either the Null is true or is false.
|Retain H0||Reject H0|
|H0 true|| Correct |
| Type I error
|H0 false|| Type II error |
False Alarm: claiming that it works when it facts it really doesn't. Sometimes that happens. You might get an initial result that looks good and it say it works. But then as you do more research, get more representative samples, better assessments, you may find out that it doesn't work.
Miss: There is really an effect out there and you just missed it. For any reasons: poor assessment, not enough subjects, no random representative sample.
Never made a conclusion on one study because they are prone to this errors.
|Do not reject <math>H_0</math>||Reject <math>H_0</math>|
|<math>H_0</math> is true|| Correct Decision |
1 - <math>alpha</math>
| Type 1 error
|<math>H_0</math> is false|| Type 2 error |
| Correct Decision
1 - <math>beta</math>
The power of test
Statistics / Equation
Most NHST's statistics are essentially ratios.
They are basically, what did you observe relative to what do you expect just due to chance.
That was standard error. How much sampling error are we going to get, just due to chance.
Biased by sample size
We'll get a significant result almost all the time if we just obtain a really large sample (Big sample size, Big N). See t-value formula.
If NHST results are reported, it is important to also report effect size
Arbitrary decision rule
- The cut-off value (alpha) is arbitrary
- p < .05 is considered standard but still arbitrary
- Problems arise when p is close to .05 but not less than .05
Yokel local test
- Many researchers use NHST because it’s the only approach they know
- NHST encourages weak hypothesis testing
If p then q Not q Therefore, not p
If the null hypothesis is correct, then these data can not occur The data have occurred Therefore, the null hypothesis is false
If the null hypothesis is correct, then these data are highly unlikely These data have occurred Therefore, the null hypothesis is highly unlikely
If a person plays football, then he or she is probably not a professional player This person is a professional player Therefore, he or she probably does not play football
Alternative to a Null Hypothesis Significance Test:
- Model comparison
- Bayesian inference