6 Brief Summary of elementary statistics
6.1 Notation
sample, statistic  population, parameter 

sample size \(n\)  population size \(N\) 
mean \(\bar{x}\)  mean \(\mu\) 
median \(\tilde{x}\)  median \(\tilde{\mu}\) 
variance \(s^2\)  variance \(\sigma^2\) 
standard deviation \(s\)  standard deviation \(\sigma\) 
proportion \(\hat{p}\)  proportion \(p\) 
parameter  discrete  continuous 

probability function  \(\small P(x) = P(X=x)\)  N/A 
density function  N/A  f(x) 
cumulative distribution function cdf  \(\small F(a)= P(X \le a) = \sum_{all x \le a} P(X=x)\)  \(\small P(X \le x)= F(x)=\int_{\infty}^xf(t)dt\) 
mean, sample  \(\small \bar x = \frac{1}{n} \sum_{\text{all x}} x\)  \(\small \bar x = \frac{1}{N} \sum_{\text{all x}} x\) 
mean, population  \(\small \mu = \sum_{\text{all x}} xP(x)\)  \(\small \mu = \int_{\text{all x}}x f(x)dx\) 
median  \(\small \tilde{x}\)  \(\small \tilde{\mu}\) 
variance, sample  \(\small s^2 = \frac{1}{n1}\sum_{\text{all x}} (x\bar x)^2\)  \(\small s^2 = \frac{1}{n1}\sum_{\text{all x}} (x\bar x)^2\) 
Variance, population  \(\small s^2 = \frac{1}{N}\sum_{\text{all x}} (x\bar x)^2\)  \(\small \sigma^2 = \int_{\infty}^{\infty} (x\mu)^2 f(x) dx\) 
6.2 Definitions
 The population is the complete collection of all subjects (like scores, people, measurements, and so on) you want to study, while a sample is a subcollection selected from a population. A census is the collection from every member of the population. Keep in mind that it is almost impossible to collect information from everyone in the population; it is also expensive and time consuming. Thus, in reality, a census is rarely possible.
 A parameter is some number describing the population while a statistic describes a sample.
 The set of all possible outcomes of an experiment is called the sample space S. A subset of the sample space is called an event. A simple event is an event that can not be expressed as a union of other events.

Formally, a random variable X is a function from the sample space S to the real numbers \(\small \mathbb R\). A continuous random variable has an associates density function f(x) such that, for any subset C of S, \(\small P(X \in C) = \int_Cf(x)dx\).
Informally, you can think of a random variable as a variable whose value depends on the outcome of a random experiment. A random variable is discrete if you can list all possible values, and continuous if you can’t.
Probability function \(\small P(X=x)\):
o used for discrete random variable X
o \(\small P(X=x) \ge 0\)
o \(\sum_{x} P(x) = 1\)
Density function \(\small f(x)\):
o used for continuous random variable X
o \(\small f(x) \ge 0\)
o \(\small \int_{x} f(x) = 1\)
 The cumulative distribution function F(x) is define as \(\small F(x) = P(X \le x)\). Note that \(\small F(x)\) is nondecreasing and continuous from the right, and that \(\small lim_{x \rightarrow \infty}F(x)=0\) and \(\small lim_{x \rightarrow \infty}F(x)=1\).
Furthermore, at any point where \(\small F(x)\) is continuous, we have \(\small F(x) = f'(x)\).  Two random variables \(\small X,Y\) are independent if \(\small P(X \in A, Y \in B) = P(X \in A)(P(Y \in B)\) for some subsets \(\small A,B\) of \(\small \mathbb R\).
 Two random events \(\small A,B\) are independent if \(\small P(A \cap B) = P(A)\cdot P(B)\).
6.2.1 Measures of center
 The (arithmetic) mean is the average of all the data, also known as the first moment. The formulas vary slightly for sample, populations, discrete , or continuous variables and are given in the table above.
 The median splits the data set in to equal halves. Formally, the median is defined as the value \(c\) such that \(P(x \le c)=P(x \ge c)\). The median is not changed if you change outliers.
 The mode is the most common value(s).
 The midrange is the value midway between the minimum and maximum, midrange=\(\frac{\text{max + min}}{2}.\) Note that the midrange only uses the largest and smallest value and ignores all the others.
6.2.2 Expected value, moments
Provided the underlying sums and integrals exist, we have:
The expected value \(\small \mu = E[X]\) is the (population) mean of the a random variable \(\small X\). \[ \small \mu = \sum_{\text{all x}} x P(x)\text{, discrete case}\] \[ \small \mu = \int_{allx}x f(x)dx \text{, continuouse case}\] The expected value has the following properties:
o \(\small E[aX+b]=aE[X]+b\)
o \(\small E[X+Y]=E[X]+E[Y]\)
o \(\small E[g(x)] = \sum_{\text{all x}}g(x)P(X=x) \text{, discrete case}\)
o \(\small E[g(x)] = \int_{allx}g(x) f(x)dx \text{, continuouse case}\)The \(n_{th}\) moment of a random variable \(\small X\) is \(\small E[X^n]\), the \(n^{th}\) central moment is \(\small E[(x\mu)^n]\).
Note that with this terminology, the expected value is the first moment, and the variance is the second central moment.
6.2.3 Measures of spread
range = maximumminimum
The variance is a measure of variation of the data values about the mean. Assuming the underlying sums and integrals converge, is computed as \(\small VAR[X]=E[(X\mu)^2].\)
o An alternative formula is \(\small VAR[X] = E[x^2](E[X])^2=E[X^2]\mu^2\)
o \(\small VAR[X]\ge 0\)
o \(\small VAR[aX + b]=a^2 VAR[X]\)
o A variance of 0 means that all values are the same.The standard deviation is the square root of the variance. The standard deviation is always ≥0.
6.2.4 Measures of relative standing
 The n% Quantiles \(\small Q_{n\%}\), sometimes also written as \(\small Q_n\) are value such that \(\small P(X\le Q_n) = n\%\).
 Sometimes, the notation \(\small Q_1, Q_2, Q_3\) is used to denote the first, second, and third quartiles. In that case, to avoid confusion, quantiles are called percentiles and written as \(\small P_n\). We have \(\small Q_2 = P_{50} = \text{median}\).
 The standardized score, standard score, or zscore z measure how many standard deviations a data point \(\small x\) sits below or above the mean.deviations a value is above or below the mean. \[\small x = \frac{\text{xmean}}{\text{standard deviation}}.\] Values with a between 2 and 2 are called usual, those with a zscore between 3 and 2 or between 2 and 3 are called outliers, and those with a zscore below 3 or above 3 are called extreme outliers.
6.2.5 Measures of shape
 The five point summary of a data set consists of the minimum, \(\small Q_1\), median, \(\small Q_3\), and the maximum. Together with box plots (aka boxandwhisker plots) it provides an easy way to get an idea of the shape of a distribution. For more detail, see the chapter on plots.
 Parson’s coefficient of skewness, also known as the index of skewness, finds the skewness in a sample. It is a standardized distance between mean and median. As a rule of thumb, data where \(\small S < 1\) is not considered significantly skewed. \[\small S = 3 \cdot \frac{\text{mean  median}}{s}\].
 The skewness is the third moment of the standardized score \[\small skew(X) = E\big[( \frac{X\mu}{\sigma})^3\big].\] Skewness is a measure of the asymmetry of the distribution (as represented by the graph of its density function) of a variable. The skew value of a symmetric distribution is zero,but be careful, a nonsymmetric distribution can have skewness 0 as well. A positive skew means that there is tail on the right side of the distribution, a negative skew value that we have a tail on the left side.
 The kurtosis is the fourth moment of the standardized score \[\small kurt(X) = E\big[( \frac{X\mu}{\sigma})^4\big].\] It measures the “peakedness” of a distribution. Larger kurtosis means that the graph of the probability density function has thicker tails, compared with the probability density function of a distribution with smaller kurtosis.
6.2.6 Assorted rules
The Empirical Rule, also known as the 689599.7% rule applies to approximately bellshaped and symmetric distributions only. It states that, about 68% of all values can be found within 1 standard deviation of the mean, about 95% within 2 standard deviations, and about 99.7% within 3 standard deviations.
Chebyshev’s Rule states that in any distribution with known mean \(\mu\) and standard deviation \(\sigma\) at least \(\small \frac{k^21}{k^2} = 1\frac{1}{k^2}\) of all values are withing \(\small k\) standard deviations of the mean.
The Range rule of thumb states that one can get a rough estimate of the standard deviation (on the order of magnitude) as \(\frac{\text{range}}{4}\).
The Law of total probability (version) Let \(B_i, i=1,...,n\) be a partition of the sample space \(S\), i.e. \(\bigcup_{i=1}^n B_i = S\) and \(B_i \cap B_j = \emptyset\). Then \[P(A)=\sum_{i=1}^n P(AB_i)P(B_i)\]
Bayes rule builds on the Law of total Probability. \[P(B_iA)=\frac{P(AB_i)P(B_i)}{P(A)}=\frac{P(AB_i)P(B_i)}{\sum_{i=1}^n P(AB_i)P(B_i)}\]