# Categorical Tests

Categories are mutually exclusive. e.g. red/blue/green or healthy/unhealthy

# χ-squared test

Used for categorical data.

Possible to do under binomial, but χ-squared is easier.

e.g picking either red or blue coloured marbles from a bag 20 times.

You have a frequency count of each category. e.g. 15 red, 12 blue

requires large frequency counts > 5

H0 is the expected frequency.

## Statistic

$$\chi^2 = \sum\limits_{i=1}^n \frac{(O_i - E_i)^2}{E_i}$$

where n is category count

Ei is the expected frequency under H0 of category i

Oi is the observed frequency under H0 of category i

## Distribution

Calculate m = df = n-1

where n is category count

$$\chi_m^2 = \sum\limits_{i=1}^m (Z_i)^2$$

where Zi is an independent random variable from N(0,1)

e.g. 2 means you take 2 independent samples from N(0,1) and square each one.

Chi-square distribution at wikipedia. Look at the distribution graph.

# Pearson's two sample χ2 test

## Contingency table

2 x 2 table with observation counts by category. e.g. placebo/drug and improved/not_improved

## Procedure

calculate marginal totals as subtotals. r1, r2, c1, c2

calculate total observations.

calculate total

calculate expected. e1,1 = r1 * c1 / total

degrees of freedom = (rows-1) * (cols-1)

Calculate χ-square statistic:

$$\chi^2 = \sum\limits_{i} \sum\limits_{j} \frac{(O_{i,j}-E_{i,j})^2}{E_{i,j}}$$

compare against χ-square distribution with degrees of freedom.

### Likelihood ratio χ2 test

Alternative test statistic

$$L^2 = 2 \sum\limits_{i} \sum\limits_{j} O_{i,j} \ln{(\frac{O_{i,j}}{E_{i,j}})}$$

For large sample size L2 and χ2 converge.

### Small samples

all expected counts (E) are greater than 1

no more than 20% of expected counts are less than 5

2x2 small sample correction, still needs all E > 1

$$\chi^2 = \frac{N}{N-1} \sum\limits_{i} \sum\limits_{j} \frac{(O_{i,j}-E_{i,j})^2}{E_{i,j}}$$

# Fisher's exact test

Used with small counts.

p-value = P(observed)/2 + sum(equal or less likely)

There is a /2 for the mid-p correction.

Calculate probabilities of all scenarios under H0 (e.g. drug has no affect)

All scenarios are done by fixing subtotals (marginal totals), and varying a single count (e.g. drug takers improved).

 improved not improved total drug 5 7 12 placebo 9 12 21 total 14 19 33

probability of above scenario is:

$$\frac{{12 \choose 5}{21 \choose 9}}{{33 \choose 14}}$$

## Odds Ratio

Odds ratio = Ratio improvement with drug / Ratio improvement without drug

Ratio improvement with drug = 5:7

Ratio improvement without drug = 9:12

## Response Ratio

Response Ratio = % responded with drug / % responded with placebo

$$= \frac{5}{12} / \frac{9}{21}$$