A random variable defines outcomes and quantifies them. e.g. X = a single dice roll value, Y = the sum of 7 dice rolls.

Now you can write P(X = 3) = 1/6, or P(Y <= 20)

A sample is a subset of a complete population. It is often not feasible to sample the whole population.

An average is a way to calculate a central value. It could refer to mean, median or mode.

Arithmetic mean. sum all values, divide by n, where n is the number of values.

$$ \mu = \frac {1}{n} \sum_{i=1}^{n} x_i $$

The population mean (as opposed to the mean of your sample).

How often a value occurs.

Absolute frequency of a value is a count of the number of times it occurred.

Relative frequency of a value is expressed as a percentage or fraction of how often it occurred.

The most frequently occurring value.

If there are 2 most frequently occurring values then it is considered bimodal

After ordering all the values, this is the value in the middle. If there 2 values for the middle position, take the mean of them.

For a sample, first determine the mean.

Then for each value, find the difference between the mean and the value, square it, and it a running total

then divide by n. (or n-1 if it is a sample so we get unbiased value)

$$ \sigma^2 = \frac {1}{n-1} \sum_{i=1}^{n} (x_i - \mu )^2$$

Sum of variances of random variables

If X = Y - Z

then Var(X) = Var(Y) + Var(Z)

range = max - min

Difference between the first and third quartile.

https://www.khanacademy.org/math/probability/random-variables-topic/random_variables_prob_dist/v/discrete-and-continuous-random-variables

Discrete is for distinct values (e.g. dice rolls), or integers. It is possible to have infinite number of integer outcomes and still be discrete.

Continuous is for any values in an interval. e.g. exact speed between 0 and 50 m/s can have many many decimal points, or infinite possible values.

https://www.khanacademy.org/math/probability/random-variables-topic/random_variables_prob_dist/v/discrete-probability-distribution

plot Y axis with probability. Probability is always less than 1.

plot X axis with values (outcomes) or scores.

https://www.khanacademy.org/math/probability/random-variables-topic/random_variables_prob_dist/v/probability-density-functions

Used for continuous distributions, exact values are impossible, so it only makes sense to calculate for ranges.

Area under curve (e.g. X between 1.9 and 2.1) is the possibility

Full area under curve is 1.

Formula for area under the graph in a probably density function from -infinity to x. For x, it gives area under the graph to the left of x.

You can use this function for x2, and x1, to calculate the area under the chart between x2 and x1.

In a normal distribution, if you put x = mean, then you will get 50% or 0.5

Quiz at https://www.khanacademy.org/math/probability/random-variables-topic/expected-value/e/expected_value

ref wikipedia.

Also know as expectation, mathematical expectation, EV, mean, or first moment

notation example at lottery ticket.

X = net profit from playing lottery

E(X) = expected net profit from player lottery.

To calculate the expected value of a discrete random variable with a probability distribution:

Determine all events and probability and value for each.

Sum for all events, probability * value. This is a weighted sum of the values.

manual example at getting data from an expected value

Take a random variable on a population, with a known expected value. If you take a sample, you will get a sample mean. The larger your sample is, the closer it will get to the expected value.

as n approaches infinity, your sample mean will converge back to E(X)

How many ways can you pick r objects from n objects. Order picked matters, objects are not replaced.

Formula: P(n,r) = nPr = n! / (n-r)!

e.g. out of 10 contestants, how many ways (or scenarios) can first, 2nd, 3rd and 4th prizes be awarded in a competition?

Permutations at Khan Academy

3 letter word example at Khan Academy.

26^3 3 letter words with no restrictions

26*25*24 = 26! / (26-3)! words with duplicate letters.

How many ways can you pick r objects from n objects? Order picked does not matter, objects are not replaced.

Formula: C(n, r) = nCr = n! / r!(n-r)!

3 people chosen from 6 example at Khan Academy.

Making inferences based on a sample of data. e.g. you might only have 100 samples out of 10000 population, and you want to generalise based on your sample to the population.

Z-score of a value is how many standard deviations away a value is from the mean.

It can be applied to any distribution (not only normal distribution)

2 sided 1.96 std. devs from mean covers 95% of the values.

not inferential, simply describing data.

random samples are taking from a population, e.g. n samples, then a statistic (e.g. mean) is found for those samples.

If you keep repeating this and take another n samples, generate the statistic, then this is distribution is the "sampling distribution".

In this example we hare using the mean as a statistic, so the distribution is the sampling distribution of the sample mean.

http://psych.hanover.edu/javatest/NeuroAnim/stats/SampDist_instr.html

This is different to a sample distribution. http://forrest.psych.unc.edu/research/vista-frames/help/lecturenotes/lecture06/sampling.html

A description of asymmetry. Right skew or positively skewed

Left skewed or negatively skewed.

Normal distribution skew is 0.

Formula Skew

Formula standard error of Skew. sqrt(3!/n)

z-score = Skew / (standard error of Skew)

measure of tallness vs longer tails

tall = positive kurtosis, leptokurtic

normal = 3 offset

short = negative kurtosis, platykurtic

formula Kurtosis. ^4, remember -3

formula standard error of Kurtosis . sqrt(4!/n)

z-score = Kurtosis / (standard error of Kurtosis)

Relative frequency of a result = frequency count / total count

e.g. quartiles, deciles, percentile. Divides data set into N groups.

middle box is lower quartile, median, upper quartile.

whiskers extend to a data point that is 1.5 * interquartile_range:

max(data > lower_quartile - 1.5 * interquartile_range)

min(data < upper_quartile + 1.5 * interquartile_range)

You have a random variable or distribution (not necessarily normal).

From the distribution, take a sample of size n, and calculate the mean.

Take another sample of size n and again calculate the mean.

after repeating k trials, you'll have k means.

If you plot these, it will resemble a normal distribution.

As sample size, n, approaches infinity, a normal distribution begins to appear. Note the original distribution is not necessarily normal.

If n = 1, then it will look like the original distribution

e.g. if n=2, then some values might not be possible.

This is the sampling distribution of sample means.

Test it out at online stat book sampling distributions.

Mean of the sample means is the same as the original distribution mean

As n increases, variance or standard deviation decreases.

Variance of sample mean = var of original dist / n

std dev of sampling distribution of the sample mean is = original_std_dev / sqrt(n)

Assumes there is no effect or relationship between variables. The mean weight of group 1 is the same as the mean weight of group 2.

Can also have direction video.

The distribution of a test statistic (e.g. measurement such as mean) if the null hypothesis is true.

Appropriate as you can assume normal distribution tests if your sample size >= 30

need to know population std dev instead of using the sample std dev.

a z-statistic is how many std devs we are above the mean, i.e. the z-score

https://www.khanacademy.org/math/probability/statistics-inferential/confidence-intervals/v/small-sample-size-confidence-intervals

https://www.khanacademy.org/math/probability/statistics-inferential/hypothesis-testing/v/z-statistics-vs-t-statistics

https://www.khanacademy.org/math/probability/statistics-inferential/hypothesis-testing/v/small-sample-hypothesis-test

https://www.khanacademy.org/math/probability/statistics-inferential/hypothesis-testing/v/t-statistic-confidence-interval

For small sample sizes < 30, use the t-test. It gives fatter fails.

Population standard deviation is unknown.

If the sample size or number of data points is 7, then the degrees of freedom is 6.

T - table let's you look up using std deviation multiple using parameters:

1. degrees of freedom

2. your confidence interval (e.g. 95%)

WOrk out your sample std dev (TODO greek)

then you divided that by sqrt(

Lookup the result, it is your t-result.

(sample_mean - pop_mean) / sample_std_dev

sample size - 1

denominator in z-score and t-score...

https://www.khanacademy.org/math/probability/statistics-inferential/hypothesis-testing-two-samples/v/difference-of-sample-means-distribution

https://www.khanacademy.org/math/probability/statistics-inferential/hypothesis-testing-two-samples/v/confidence-interval-of-difference-of-means

Make a distribution based on the two samples. Z = X - Y

The confidence interval of 95% would be on the distribution based on the two samples. If you are using a normal distribution then you can use a z-table and find the 95% interval. A 95% confidence interval is +- 1.96 std devs away from the mean.

To work out std dev remember that var(Z) = var (X) + var(Y).

Since sample size is bigger than 30, you can approximate the population std using the sample.

not 100% sure, but you can say you are 95% sure of something.

https://www.khanacademy.org/math/probability/statistics-inferential/hypothesis-testing-two-samples/v/clarification-of-confidence-interval-of-difference-of-means

It is a estimate for a population parameter

Parametric tests assume something about the population (e.g. that it is normally distributed).

A non-parametric test does not make an assumption.

A non-parametric test comparing the medians of two samples.

Tests two samples for whether they are drawn from the sample distribution.

Compares 2 distributions examples.

patrons per day of week example at Khan Academy.

H_0 can be the expected proportions

Given observed counts

Calculate total observed counts

Calculate expected counts using the expected proportions.

chi-square statistic = X^2 = sum ( (observed-expected) ^ 2 / expected) )

e.g. if you have Monday as 30 patrons expected, 45 observed, then for Monday the portion will be (30-45)^2 / 30

= 15^2 / 30

= 225 / 30

= 7.5

Then do this for each day of the week, sum the all up, and that is the chi-square statistic.

Degrees of freedom is n-1, so if your restaurant was open 6 days of the week, df = 5.

Another example of contingency table from Khan Academy.

Contingency tables hold frequencies of occurrence of events in mutually exclusive categories from two or more samples

table with sub totals for each row and column, and total.

df (degrees of freedom) is usually (rows -1)*(cols - 1). This is because subtotals are fixed, and the last value can always be derived.

http://en.wikipedia.org/wiki/Fisher's_exact_test

http://my.ilstu.edu/~wjschne/138/Psychology138Lab12.html

https://www.khanacademy.org/math/probability/independent-dependent-probability/dependent_probability/v/introduction-to-dependent-probability

A = Event 1, B = Event 2

∩ = Intersection

| = given

P(A ∩ B) = P(A|B).P(B) = P(B|A).P(A)

In a model, independent variables are inputs, and dependent variables are expected to change based on the independent variables.