# 6 Brief Summary of elementary statistics

## 6.1 Notation

sample, statistic population, parameter
sample size $$n$$ population size $$N$$
mean $$\bar{x}$$ mean $$\mu$$
median $$\tilde{x}$$ median $$\tilde{\mu}$$
variance $$s^2$$ variance $$\sigma^2$$
standard deviation $$s$$ standard deviation $$\sigma$$
proportion $$\hat{p}$$ proportion $$p$$
parameter discrete continuous
probability function $$\small P(x) = P(X=x)$$ N/A
density function N/A f(x)
cumulative distribution function cdf $$\small F(a)= P(X \le a) = \sum_{all x \le a} P(X=x)$$ $$\small P(X \le x)= F(x)=\int_{\infty}^xf(t)dt$$
mean, sample $$\small \bar x = \frac{1}{n} \sum_{\text{all x}} x$$ $$\small \bar x = \frac{1}{N} \sum_{\text{all x}} x$$
mean, population $$\small \mu = \sum_{\text{all x}} xP(x)$$ $$\small \mu = \int_{\text{all x}}x f(x)dx$$
median $$\small \tilde{x}$$ $$\small \tilde{\mu}$$
variance, sample $$\small s^2 = \frac{1}{n-1}\sum_{\text{all x}} (x-\bar x)^2$$ $$\small s^2 = \frac{1}{n-1}\sum_{\text{all x}} (x-\bar x)^2$$
Variance, population $$\small s^2 = \frac{1}{N}\sum_{\text{all x}} (x-\bar x)^2$$ $$\small \sigma^2 = \int_{-\infty}^{\infty} (x-\mu)^2 f(x) dx$$

## 6.2 Definitions

• The population is the complete collection of all subjects (like scores, people, measurements, and so on) you want to study, while a sample is a sub-collection selected from a population. A census is the collection from every member of the population. Keep in mind that it is almost impossible to collect information from everyone in the population; it is also expensive and time consuming. Thus, in reality, a census is rarely possible.
• A parameter is some number describing the population while a statistic describes a sample.
• The set of all possible outcomes of an experiment is called the sample space S. A subset of the sample space is called an event. A simple event is an event that can not be expressed as a union of other events.
• Formally, a random variable X is a function from the sample space S to the real numbers $$\small \mathbb R$$. A continuous random variable has an associates density function f(x) such that, for any subset C of S, $$\small P(X \in C) = \int_Cf(x)dx$$.
Informally, you can think of a random variable as a variable whose value depends on the outcome of a random experiment. A random variable is discrete if you can list all possible values, and continuous if you can’t.
Probability function $$\small P(X=x)$$:
o used for discrete random variable X
o $$\small P(X=x) \ge 0$$
o $$\sum_{x} P(x) = 1$$
Density function $$\small f(x)$$:
o used for continuous random variable X
o $$\small f(x) \ge 0$$
o $$\small \int_{x} f(x) = 1$$
• The cumulative distribution function F(x) is define as $$\small F(x) = P(X \le x)$$. Note that $$\small F(x)$$ is non-decreasing and continuous from the right, and that $$\small lim_{x \rightarrow -\infty}F(x)=0$$ and $$\small lim_{x \rightarrow \infty}F(x)=1$$.
Furthermore, at any point where $$\small F(x)$$ is continuous, we have $$\small F(x) = f'(x)$$.
• Two random variables $$\small X,Y$$ are independent if $$\small P(X \in A, Y \in B) = P(X \in A)(P(Y \in B)$$ for some subsets $$\small A,B$$ of $$\small \mathbb R$$.
• Two random events $$\small A,B$$ are independent if $$\small P(A \cap B) = P(A)\cdot P(B)$$.

### 6.2.1 Measures of center

• The (arithmetic) mean is the average of all the data, also known as the first moment. The formulas vary slightly for sample, populations, discrete , or continuous variables and are given in the table above.
• The median splits the data set in to equal halves. Formally, the median is defined as the value $$c$$ such that $$P(x \le c)=P(x \ge c)$$. The median is not changed if you change outliers.
• The mode is the most common value(s).
• The mid-range is the value midway between the minimum and maximum, mid-range=$$\frac{\text{max + min}}{2}.$$ Note that the mid-range only uses the largest and smallest value and ignores all the others.

### 6.2.2 Expected value, moments

Provided the underlying sums and integrals exist, we have:

• The expected value $$\small \mu = E[X]$$ is the (population) mean of the a random variable $$\small X$$. $\small \mu = \sum_{\text{all x}} x P(x)\text{, discrete case}$ $\small \mu = \int_{allx}x f(x)dx \text{, continuouse case}$ The expected value has the following properties:
o $$\small E[aX+b]=aE[X]+b$$
o $$\small E[X+Y]=E[X]+E[Y]$$
o $$\small E[g(x)] = \sum_{\text{all x}}g(x)P(X=x) \text{, discrete case}$$
o $$\small E[g(x)] = \int_{allx}g(x) f(x)dx \text{, continuouse case}$$

• The $$n_{th}$$ moment of a random variable $$\small X$$ is $$\small E[X^n]$$, the $$n^{th}$$ central moment is $$\small E[(x-\mu)^n]$$.

Note that with this terminology, the expected value is the first moment, and the variance is the second central moment.

• range = maximum-minimum

• The variance is a measure of variation of the data values about the mean. Assuming the underlying sums and integrals converge, is computed as $$\small VAR[X]=E[(X-\mu)^2].$$
o An alternative formula is $$\small VAR[X] = E[x^2]-(E[X])^2=E[X^2]-\mu^2$$
o $$\small VAR[X]\ge 0$$
o $$\small VAR[aX + b]=a^2 VAR[X]$$
o A variance of 0 means that all values are the same.

• The standard deviation is the square root of the variance. The standard deviation is always ≥0.

### 6.2.4 Measures of relative standing

• The n% Quantiles $$\small Q_{n\%}$$, sometimes also written as $$\small Q_n$$ are value such that $$\small P(X\le Q_n) = n\%$$.
• Sometimes, the notation $$\small Q_1, Q_2, Q_3$$ is used to denote the first, second, and third quartiles. In that case, to avoid confusion, quantiles are called percentiles and written as $$\small P_n$$. We have $$\small Q_2 = P_{50} = \text{median}$$.
• The standardized score, standard score, or z-score z measure how many standard deviations a data point $$\small x$$ sits below or above the mean.deviations a value is above or below the mean. $\small x = \frac{\text{x-mean}}{\text{standard deviation}}.$ Values with a between -2 and 2 are called usual, those with a z-score between -3 and -2 or between 2 and 3 are called outliers, and those with a z-score below -3 or above 3 are called extreme outliers.

### 6.2.5 Measures of shape

• The five point summary of a data set consists of the minimum, $$\small Q_1$$, median, $$\small Q_3$$, and the maximum. Together with box plots (aka box-and-whisker plots) it provides an easy way to get an idea of the shape of a distribution. For more detail, see the chapter on plots.
• Parson’s coefficient of skewness, also known as the index of skewness, finds the skewness in a sample. It is a standardized distance between mean and median. As a rule of thumb, data where $$\small |S| < 1$$ is not considered significantly skewed. $\small S = 3 \cdot \frac{\text{mean - median}}{s}$.
• The skewness is the third moment of the standardized score $\small skew(X) = E\big[( \frac{X-\mu}{\sigma})^3\big].$ Skewness is a measure of the asymmetry of the distribution (as represented by the graph of its density function) of a variable. The skew value of a symmetric distribution is zero,but be careful, a non-symmetric distribution can have skewness 0 as well. A positive skew means that there is tail on the right side of the distribution, a negative skew value that we have a tail on the left side.
• The kurtosis is the fourth moment of the standardized score $\small kurt(X) = E\big[( \frac{X-\mu}{\sigma})^4\big].$ It measures the “peakedness” of a distribution. Larger kurtosis means that the graph of the probability density function has thicker tails, compared with the probability density function of a distribution with smaller kurtosis.

### 6.2.6 Assorted rules

• The Empirical Rule, also known as the 68-95-99.7% rule applies to approximately bell-shaped and symmetric distributions only. It states that, about 68% of all values can be found within 1 standard deviation of the mean, about 95% within 2 standard deviations, and about 99.7% within 3 standard deviations.

• Chebyshev’s Rule states that in any distribution with known mean $$\mu$$ and standard deviation $$\sigma$$ at least $$\small \frac{k^2-1}{k^2} = 1-\frac{1}{k^2}$$ of all values are withing $$\small k$$ standard deviations of the mean.

• The Range rule of thumb states that one can get a rough estimate of the standard deviation (on the order of magnitude) as $$\frac{\text{range}}{4}$$.

• The Law of total probability (version) Let $$B_i, i=1,...,n$$ be a partition of the sample space $$S$$, i.e. $$\bigcup_{i=1}^n B_i = S$$ and $$B_i \cap B_j = \emptyset$$. Then $P(A)=\sum_{i=1}^n P(A|B_i)P(B_i)$

• Bayes rule builds on the Law of total Probability. $P(B_i|A)=\frac{P(A|B_i)P(B_i)}{P(A)}=\frac{P(A|B_i)P(B_i)}{\sum_{i=1}^n P(A|B_i)P(B_i)}$

## 6.3 Assignment

Hand in a rendered (knitted) RMD file with all the definitions and concepts that are unfamiliar to you and that you want to review.