6 Brief Summary of elementary statistics

6.1 Notation

sample, statistic population, parameter
sample size \(n\) population size \(N\)
mean \(\bar{x}\) mean \(\mu\)
median \(\tilde{x}\) median \(\tilde{\mu}\)
variance \(s^2\) variance \(\sigma^2\)
standard deviation \(s\) standard deviation \(\sigma\)
proportion \(\hat{p}\) proportion \(p\)
parameter discrete continuous
probability function \(\small P(x) = P(X=x)\) N/A
density function N/A f(x)
cumulative distribution function cdf \(\small F(a)= P(X \le a) = \sum_{all x \le a} P(X=x)\) \(\small P(X \le x)= F(x)=\int_{\infty}^xf(t)dt\)
mean, sample \(\small \bar x = \frac{1}{n} \sum_{\text{all x}} x\) \(\small \bar x = \frac{1}{N} \sum_{\text{all x}} x\)
mean, population \(\small \mu = \sum_{\text{all x}} xP(x)\) \(\small \mu = \int_{\text{all x}}x f(x)dx\)
median \(\small \tilde{x}\) \(\small \tilde{\mu}\)
variance, sample \(\small s^2 = \frac{1}{n-1}\sum_{\text{all x}} (x-\bar x)^2\) \(\small s^2 = \frac{1}{n-1}\sum_{\text{all x}} (x-\bar x)^2\)
Variance, population \(\small s^2 = \frac{1}{N}\sum_{\text{all x}} (x-\bar x)^2\) \(\small \sigma^2 = \int_{-\infty}^{\infty} (x-\mu)^2 f(x) dx\)

6.2 Definitions

  • The population is the complete collection of all subjects (like scores, people, measurements, and so on) you want to study, while a sample is a sub-collection selected from a population. A census is the collection from every member of the population. Keep in mind that it is almost impossible to collect information from everyone in the population; it is also expensive and time consuming. Thus, in reality, a census is rarely possible.
  • A parameter is some number describing the population while a statistic describes a sample.
  • The set of all possible outcomes of an experiment is called the sample space S. A subset of the sample space is called an event. A simple event is an event that can not be expressed as a union of other events.
  • Formally, a random variable X is a function from the sample space S to the real numbers \(\small \mathbb R\). A continuous random variable has an associates density function f(x) such that, for any subset C of S, \(\small P(X \in C) = \int_Cf(x)dx\).
    Informally, you can think of a random variable as a variable whose value depends on the outcome of a random experiment. A random variable is discrete if you can list all possible values, and continuous if you can’t.
    Probability function \(\small P(X=x)\):
    o used for discrete random variable X
    o \(\small P(X=x) \ge 0\)
    o \(\sum_{x} P(x) = 1\)
    Density function \(\small f(x)\):
    o used for continuous random variable X
    o \(\small f(x) \ge 0\)
    o \(\small \int_{x} f(x) = 1\)
  • The cumulative distribution function F(x) is define as \(\small F(x) = P(X \le x)\). Note that \(\small F(x)\) is non-decreasing and continuous from the right, and that \(\small lim_{x \rightarrow -\infty}F(x)=0\) and \(\small lim_{x \rightarrow \infty}F(x)=1\).
    Furthermore, at any point where \(\small F(x)\) is continuous, we have \(\small F(x) = f'(x)\).
  • Two random variables \(\small X,Y\) are independent if \(\small P(X \in A, Y \in B) = P(X \in A)(P(Y \in B)\) for some subsets \(\small A,B\) of \(\small \mathbb R\).
  • Two random events \(\small A,B\) are independent if \(\small P(A \cap B) = P(A)\cdot P(B)\).

6.2.1 Measures of center

  • The (arithmetic) mean is the average of all the data, also known as the first moment. The formulas vary slightly for sample, populations, discrete , or continuous variables and are given in the table above.
  • The median splits the data set in to equal halves. Formally, the median is defined as the value \(c\) such that \(P(x \le c)=P(x \ge c)\). The median is not changed if you change outliers.
  • The mode is the most common value(s).
  • The mid-range is the value midway between the minimum and maximum, mid-range=\(\frac{\text{max + min}}{2}.\) Note that the mid-range only uses the largest and smallest value and ignores all the others.

6.2.2 Expected value, moments

Provided the underlying sums and integrals exist, we have:

  • The expected value \(\small \mu = E[X]\) is the (population) mean of the a random variable \(\small X\). \[ \small \mu = \sum_{\text{all x}} x P(x)\text{, discrete case}\] \[ \small \mu = \int_{allx}x f(x)dx \text{, continuouse case}\] The expected value has the following properties:
    o \(\small E[aX+b]=aE[X]+b\)
    o \(\small E[X+Y]=E[X]+E[Y]\)
    o \(\small E[g(x)] = \sum_{\text{all x}}g(x)P(X=x) \text{, discrete case}\)
    o \(\small E[g(x)] = \int_{allx}g(x) f(x)dx \text{, continuouse case}\)

  • The \(n_{th}\) moment of a random variable \(\small X\) is \(\small E[X^n]\), the \(n^{th}\) central moment is \(\small E[(x-\mu)^n]\).

Note that with this terminology, the expected value is the first moment, and the variance is the second central moment.

6.2.3 Measures of spread

  • range = maximum-minimum

  • The variance is a measure of variation of the data values about the mean. Assuming the underlying sums and integrals converge, is computed as \(\small VAR[X]=E[(X-\mu)^2].\)
    o An alternative formula is \(\small VAR[X] = E[x^2]-(E[X])^2=E[X^2]-\mu^2\)
    o \(\small VAR[X]\ge 0\)
    o \(\small VAR[aX + b]=a^2 VAR[X]\)
    o A variance of 0 means that all values are the same.

  • The standard deviation is the square root of the variance. The standard deviation is always ≥0.

6.2.4 Measures of relative standing

  • The n% Quantiles \(\small Q_{n\%}\), sometimes also written as \(\small Q_n\) are value such that \(\small P(X\le Q_n) = n\%\).
  • Sometimes, the notation \(\small Q_1, Q_2, Q_3\) is used to denote the first, second, and third quartiles. In that case, to avoid confusion, quantiles are called percentiles and written as \(\small P_n\). We have \(\small Q_2 = P_{50} = \text{median}\).
  • The standardized score, standard score, or z-score z measure how many standard deviations a data point \(\small x\) sits below or above the mean.deviations a value is above or below the mean. \[\small x = \frac{\text{x-mean}}{\text{standard deviation}}.\] Values with a between -2 and 2 are called usual, those with a z-score between -3 and -2 or between 2 and 3 are called outliers, and those with a z-score below -3 or above 3 are called extreme outliers.

6.2.5 Measures of shape

  • The five point summary of a data set consists of the minimum, \(\small Q_1\), median, \(\small Q_3\), and the maximum. Together with box plots (aka box-and-whisker plots) it provides an easy way to get an idea of the shape of a distribution. For more detail, see the chapter on plots.
  • Parson’s coefficient of skewness, also known as the index of skewness, finds the skewness in a sample. It is a standardized distance between mean and median. As a rule of thumb, data where \(\small |S| < 1\) is not considered significantly skewed. \[\small S = 3 \cdot \frac{\text{mean - median}}{s}\].
  • The skewness is the third moment of the standardized score \[\small skew(X) = E\big[( \frac{X-\mu}{\sigma})^3\big].\] Skewness is a measure of the asymmetry of the distribution (as represented by the graph of its density function) of a variable. The skew value of a symmetric distribution is zero,but be careful, a non-symmetric distribution can have skewness 0 as well. A positive skew means that there is tail on the right side of the distribution, a negative skew value that we have a tail on the left side.
  • The kurtosis is the fourth moment of the standardized score \[\small kurt(X) = E\big[( \frac{X-\mu}{\sigma})^4\big].\] It measures the “peakedness” of a distribution. Larger kurtosis means that the graph of the probability density function has thicker tails, compared with the probability density function of a distribution with smaller kurtosis.

6.2.6 Assorted rules

  • The Empirical Rule, also known as the 68-95-99.7% rule applies to approximately bell-shaped and symmetric distributions only. It states that, about 68% of all values can be found within 1 standard deviation of the mean, about 95% within 2 standard deviations, and about 99.7% within 3 standard deviations.

  • Chebyshev’s Rule states that in any distribution with known mean \(\mu\) and standard deviation \(\sigma\) at least \(\small \frac{k^2-1}{k^2} = 1-\frac{1}{k^2}\) of all values are withing \(\small k\) standard deviations of the mean.

  • The Range rule of thumb states that one can get a rough estimate of the standard deviation (on the order of magnitude) as \(\frac{\text{range}}{4}\).

  • The Law of total probability (version) Let \(B_i, i=1,...,n\) be a partition of the sample space \(S\), i.e. \(\bigcup_{i=1}^n B_i = S\) and \(B_i \cap B_j = \emptyset\). Then \[P(A)=\sum_{i=1}^n P(A|B_i)P(B_i)\]

  • Bayes rule builds on the Law of total Probability. \[P(B_i|A)=\frac{P(A|B_i)P(B_i)}{P(A)}=\frac{P(A|B_i)P(B_i)}{\sum_{i=1}^n P(A|B_i)P(B_i)}\]

6.3 Assignment

Hand in a rendered (knitted) RMD file with all the definitions and concepts that are unfamiliar to you and that you want to review.