Categories are mutually exclusive. e.g. red/blue/green or healthy/unhealthy

Used for categorical data.

Possible to do under binomial, but χ-squared is easier.

e.g picking either red or blue coloured marbles from a bag 20 times.

You have a frequency count of each category. e.g. 15 red, 12 blue

requires large frequency counts > 5

H_{0} is the expected frequency.

$$ \chi^2 = \sum\limits_{i=1}^n \frac{(O_i - E_i)^2}{E_i} $$

where n is category count

E_{i} is the expected frequency under H_{0} of category i

O_{i} is the observed frequency under H_{0} of category i

Calculate m = df = n-1

where n is category count

$$ \chi_m^2 = \sum\limits_{i=1}^m (Z_i)^2 $$

where Z_{i} is an independent random variable from N(0,1)

e.g. 2 means you take 2 independent samples from N(0,1) and square each one.

χ-square distribution at Khan Academy.

Chi-square distribution at wikipedia. Look at the distribution graph.

2 x 2 table with observation counts by category. e.g. placebo/drug and improved/not_improved

calculate marginal totals as subtotals. r_{1}, r_{2}, c_{1}, c_{2}

calculate total observations.

calculate total

calculate expected. e_{1,1} = r_{1} * c_{1} / total

degrees of freedom = (rows-1) * (cols-1)

Calculate χ-square statistic:

$$ \chi^2 = \sum\limits_{i} \sum\limits_{j} \frac{(O_{i,j}-E_{i,j})^2}{E_{i,j}} $$

compare against χ-square distribution with degrees of freedom.

Alternative test statistic

$$ L^2 = 2 \sum\limits_{i} \sum\limits_{j} O_{i,j} \ln{(\frac{O_{i,j}}{E_{i,j}})} $$

For large sample size L^{2} and χ^{2} converge.

all expected counts (E) are greater than 1

no more than 20% of expected counts are less than 5

2x2 small sample correction, still needs all E > 1

$$ \chi^2 = \frac{N}{N-1} \sum\limits_{i} \sum\limits_{j} \frac{(O_{i,j}-E_{i,j})^2}{E_{i,j}} $$

Used with small counts.

p-value = P(observed)/2 + sum(equal or less likely)

There is a /2 for the mid-p correction.

Calculate probabilities of all scenarios under H_{0} (e.g. drug has no affect)

All scenarios are done by fixing subtotals (marginal totals), and varying a single count (e.g. drug takers improved).

improved | not improved | total | |

drug | 5 | 7 | 12 |

placebo | 9 | 12 | 21 |

total | 14 | 19 | 33 |

probability of above scenario is:

$$ \frac{{12 \choose 5}{21 \choose 9}}{{33 \choose 14}} $$

Odds ratio = Ratio improvement with drug / Ratio improvement without drug

Ratio improvement with drug = 5:7

Ratio improvement without drug = 9:12

Response Ratio = % responded with drug / % responded with placebo

$$ = \frac{5}{12} / \frac{9}{21} $$