Hypothesis Testing Steps

Formulate Hypothesis

Null hypothesis H0.

The opposite of the null hypothesis is the alternative hypothesis Ha or research hypothesis H1.

Determine one-sided or two-sided hypothesis

You can have a non directional research hypothesis, e.g. average weight of group 1 is DIFFERENT to the average weight of group 2. This is a two-tailed test.

Or you can have a directional research hypothesis. e.g. average weight of group 1 is HIGHER than the average weight of group 2. This is a one-tailed test.

Choose Significance Level

In biology, α is chosen to be 5% or 0.05

Significance level is also known as Type I Error, or Level of Risk.

Pick correct test

Determine which test you should be using.

Calculate test statistic

Using the test, calculate the test statistic from your sample using the formula.

This is also known as your observed value, or obtained value.

Lookup Test Distribution

For the chosen test, determine the parameters (e.g. degrees of freedom). Graph it.

Determine the area under the chart that corresponds with chance being the most likely explanation. It is dependent on the chosen picked a significance level. If you picked 5%, then shade in 95%. The interval on the x-axis is the confidence interval. The edges of this interval are the critical values.

Calculate p-value

Compare your test statistic to the distribution chart or table.

You can check this against the critical value, or determine the percentage of values that are more extreme. This is your p-value.

the p-value is the probability of your results happening by chance.

Reject or accept H0 based on the p-value.

This is compared to the level of risk (or level of significance or type I error), and if it is lower, then the null hypothesis is rejected, and assumed that the results were NOT due to chance, and the research hypothesis is correct.


Type I Errors

Incorrect rejection of H0. At 5% significance level we'd expect this to happen 5% of the time.

Type I errors at Khan Academy.

Type II Errors

Incorrect acceptance of H0. e.g. higher sample size is better.

Power of test

Power is 1 - chance_of_type_II_error.

Power is how often a test will reject the null and correctly detect a difference.