category|numeric vs. category|numeric

Used when you have 2 data sets of numerical data.

Tests whether the 2 data sets come from the same population, by comparing variance of the 2 sets.

$$ F = \frac{\sigma_1^2}{\sigma_2^2} $$

where σ_{1}^{2} is the variance from sample 1.

and σ_{2}^{2} is the variance from sample 2.

Compare the F value to the F cumulative distribution with degrees of freedom m-1 and n-1, where m is the size of sample 1 and n is the size of sample 2.

Formula for F distribution of (m,n)

$$ F_{m,n} = (\sum_{i=1}^m (Z_i)^2 / m) / (\sum_{i=1}^n (Z_i)^2 / n) $$

Where Z_{i} is a random value taken from N(0,1)

Large values are more extreme.

Used when you have k sets of numerical data.

Are all the means equal between the k groups. If you find they are different, you'll still need to test each pair.

Calculate variation between groups, SSB, and DFB

Calculate total mean.

For each group, calculate mean of the group, diff with total mean, square it, multiple it by number of values in the group.

$$ SSB = \sum_{i=1}^k n_i (\mu - \mu_i)^2 $$

where n_{i} is the count for group i. This gives them proportional weighting.

$$ DFB = k-1 $$

Calculate variation within groups, SSW, and DFW

For each k group, for each value in that group, square the difference between the value and the group's mean.

$$ SSW = \sum_{i=1}^k \sum_{j=1}^{n_i} (x_{i,j}-\mu_i) ^ 2 $$

where x_{i,j} is the *j*th value in group i

n_{i} is the total number of values in the *i*th group

$$ DFW = N - k $$

where N is the total number of values, k is the number of groups.

$$ F = \frac{SSB / DFB}{SSW / DFW} $$

Compare the F value to the F cumulative distribution with DFB and DFW degrees of freedom.

F test will tell you if sample mean is different between k groups, but not which groups.

Adjust α_{FWE} to 0.05/N so type I error is the same. N is the number of retests required. FWE stands
for family wide error. This adjustment us conservative due to correlation.

$$ \alpha_{ij} = \frac{\alpha_{FWE}}{N} $$

Ryan adjustment. Rank means and begin testing between largest differences first.

$$ \alpha_{ij} = \frac{\alpha_{FWE}}{N / |R_i - R_j|} $$

Works on the same data as ANOVA (k groups of data).

non-parametric. Ranks all data.

Tests whether a random sample from a group is more highly ranked that another group 50% of the time.

H_{0}: There is no correlation,

Test statistic is 0 for no correlation.

Assumes binormal distribution of x,y values.

Affected by outliers.

Ranges from -1 to 1

Detects linear correlation (y=ax+b)

$$ r = \frac{\sum_{i=1}^{N} (x_i - \mu_x)(y_i - \mu_y)}{\sqrt{\sum_{i=1}^{N}(x_i - \mu_x)^2(y_i - \mu_y)^2}} $$

Significance level, to against t distribution with N-2 degrees of freedom.

$$ t = r \sqrt{\frac{N-2}{1-r^2}}$$

Uses rank instead of actual x and y values.

$$ \rho = \frac{\sum_{i=1}^{N} (Rx_i - \mu_{Rx})(Ry_i - \mu_{Ry})}{\sqrt{\sum_{i=1}^{N}(Rx_i - \mu_{Rx})^2(Ry_i - \mu_{Ry})^2}} $$

$$ \mu_{Rx} = \mu_{Ry} = (N+1)/2 $$

$$ t = \rho \sqrt{\frac{N-2}{1-\rho^2}} $$

again compare with t distribution with degrees of freedom N-2

Matches all pairs of data. N(N+1)/2 pairs.

Tallies up concordant, discordant, extra x, extra y, or match

Maps it onto a Z distribution.