Review

This chapter provides a review of the tools needed for regression analysis.

Concepts in matrix theory (Stat 135) are first presented then followed by important results in theory of parametric statistical inference (Stat 131).

R codes are also included for some results for visualization.

This is an optional chapter that will not be included in the exam.

0.1 Review of Matrix Theory

WHY DO WE NEED MATRIX THEORY?

We are dealing with multiple variables. Matrices give us compact representation of equations for simplification of calculations, instead of the summation and other basic operations.


Basic Concepts

  • A matrix is an array of numbers (constants or variables) containing \(r\) rows and \(c\) columns \[ \underset{3\times4} {\textbf A }= \begin{bmatrix} 0 & 9 & 2 &3 \\ 7 & 6 & 4 &5 \\ 11 & 2 & 1 & 8 \\ \end{bmatrix} \]

  • The dimension or order is the size of the matrix, i.e. the number of rows and columns

  • A vector is an array of numbers (constants or variables) arranged in rows or columns \[ \underset{3\times1} {\textbf a }= \begin{bmatrix} 2 \\ 7 \\ 8 \end{bmatrix} \quad \underset{1\times3} {\textbf a' }= \begin{bmatrix} 2 & 7 & 8 \end{bmatrix} \]

  • A square matrix is a matrix that has equal number of rows and columns \[ \underset{n\times n}{\textbf{A}} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} \]

  • The diagonal elements are the elements found in the diagonal of a square matrix while those elements other than the diagonal elements are the off-diagonal or nondiagonal elements.

  • A diagonal matrix is a square matrix that has zero for all of its off-diagonal elements. \[ \begin{bmatrix} a_{11} & 0 & \cdots & 0 \\ 0 & a_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & a_{nn} \end{bmatrix} \]

  • A triangular matrix is a square matrix with all elements above (or below) the diagonal being zero. \[ \begin{bmatrix} a_{11} & 0 & \cdots & 0 \\ a_{21} & a_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} \quad or \quad \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ 0 & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & a_{nn} \end{bmatrix} \]


Matrix Operations

  • Transpose of a matrix: \(\textbf{A}'\)
  • Trace of a matrix: \(tr(\textbf{A})=\sum_{i=1}^n a_{ii}\)
  • Addition of conformable matrices: \(\underset{p \times q}{\textbf{A}}+\underset{p \times q}{\textbf{B}}\)
  • Scalar Multiplication: \(c\textbf{A}=\{ca_{ij}\}\)
  • Multiplication of conformable matrices: \(\underset{p \times q}{\textbf{A}}\times \underset{q \times r}{\textbf{C}}=\underset{p \times r}{\textbf{D}}\)
  • Determinant of a matrix \(det(\textbf{A})=|\textbf{A}|\)

Special Matrices

  • Symmetric Matrix: \(\textbf{A}=\textbf{A}'\)

  • Idempotent Matrix: \(\textbf{A}^2=\textbf{A}\)

  • Null Matrix: \(\textbf{0}=\begin{bmatrix} 0 & 0 & \cdots & 0 \\0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 \end{bmatrix}\)

  • Identity matrix: \(\textbf{I}=\begin{bmatrix} 1 & 0 & \cdots & 0 \\0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{bmatrix}\)

  • Summing vector: \(\textbf{1}=\begin{bmatrix}1&1&\cdots1\end{bmatrix}'\)

  • Matrix of ones (J Matrix): \[ \textbf{J}_2=\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix},\quad \textbf{J}_{2.5}=\begin{bmatrix} 1 & 1 &1 &1 &1 \\ 1 & 1 & 1 & 1 & 1\end{bmatrix}, \quad \textbf{J}=\textbf1\textbf{1}' \]

  • Let \(\textbf{J}\) be a square matrix of ones of order \(n\). Then \[ \bar{\textbf{J}}=\frac{1}{n}\textbf{J}= \begin{bmatrix} 1/n & 1/n & \cdots & 1/n \\ 1/n & 1/n & \cdots & 1/n \\ \vdots & \vdots & \ddots & \vdots \\ 1/n & 1/n & \cdots & 1/n \end{bmatrix} \]

  • Centering matrix: \(\textbf{C} = \textbf{I}-\bar{\textbf{J}}\)

  • Positive semi-definite: A matrix \(\textbf{M}\) such that \(\textbf{x}^T\textbf{M}\textbf{x}\geq0 \quad \forall\textbf{x}\in\mathbb{R}^n\)


Invertibility and Singularity

  • An \(n \times n\) square matrix \(\textbf{A}\) is invertible if and only if \(|\textbf{A}|\neq 0\)

  • An \(n \times n\) square matrix \(\textbf{A}\) is singular if \(|\textbf{A}|= 0\) and nonsingular if \(|\textbf{A}|\neq 0\)

Results on Inverses

  1. If a matrix has an inverse, then the inverse is unique

  2. \((\textbf{A}^{-1})^{-1}=\textbf{A}\)

  3. \(\textbf{A}^{-1}\textbf{A}=\textbf{I}\)

  4. If \(\textbf{A}\) and \(\textbf{B}\) are nonsingular matrices and \(\textbf{AB}\) is defined, then \((\textbf{AB})^{-1}=\textbf{B}^{-1}\textbf{A}^{-1}\)

  5. The transpose of an invertible matrix is also invertible.

  6. A square matrix \(\textbf{A}\) is orthogonal if \(\textbf{A}'=\textbf{A}^{-1}\rightarrow \textbf{A}\textbf{A}'=\textbf{I}\)


Linear Dependence and Ranks

  • If \(\textbf{M}=[\textbf{m}_1, \textbf{m}_2, …,\textbf{m}_n]\) , \(\textbf{m}\)s are vectors of dimension \(n\times 1\), \(\textbf{m}\)s are linearly dependent if there exists constants \(c_1, c_2,…,c_n\), not all zero such that \(c_1\textbf{m}_1 + c_2\textbf{m}_2 +\cdots +c_n\textbf{m}_n = \textbf{0}\).

  • The rank of matrix \(\textbf{M}\) is defined to be the largest number of linearly independent rows (columns) of \(\textbf{M}\).

Some results on ranks

  1. \(rk(\textbf{A})=rk(\textbf{A}')\)

  2. If \(\textbf{A}\) is idempotent, \(rk(\textbf{I}-\textbf{A})=rk(\textbf{I})-rk(\textbf{A})=tr(\textbf{I}-\textbf{A})\)

  3. If two square matrices \(\textbf{A}\) and \(\textbf{B}\) , each of order \(n\), are nonsingular, then for any matrix \(\textbf{C}\) where multiplication with \(\textbf{A}\) and \(\textbf{B}\) are defined, the matrices \(\textbf{C}\), \(\textbf{AC}\), \(\textbf{CB}\) , and \(\textbf{ACB}\) all have the same rank.

  4. The rank of the product of two matrices \(\textbf{A}\) and \(\textbf{B}\) is at most equal to the smaller of the ranks of \(\textbf{A}\) and \(\textbf{B}\). \[ rk(\textbf{AB})\leq \min\{rk(\textbf{A}),rk(\textbf{B})\} \]

  5. Let \(\textbf{A}\) be a square matrix of order \(n\) . \(|\textbf{A}| = 0\) if and only if \(rk(\textbf{A}) < n\).

  6. A square matrix \(\underset{n\times n}{\textbf{A}}\) is invertible or nonsingular (\(|\textbf{A}|\neq0\)) if it is full rank (\(rk(\textbf{A})=n\)).

  7. Let \(\textbf{A}\) and \(\textbf{B}\) be both \(m \times n\) matrices with ranks \(r_1\) and \(r_2\) respectively. Then \(rk(\textbf{A} + \textbf{B}) ≤ r_1 + r_2\).


Eigenvalues and Eigenvectors

Let \(\textbf{A}\) be an \(n \times n\) matrix.

  • A scalar \(\lambda\) is an eigenvalue of \(\textbf{A}\) if there \(\exists\) a nonzero vector \(\textbf{x}\in \mathbb{R}^n\) such that \(\textbf{Ax}=\lambda\textbf{x}\).
  • Any \(\textbf{x}\neq\textbf{0}\) satisfying the above equation is called an eigenvector of \(\textbf{A}\) corresponding to eigenvalue \(\lambda\)

Remarks on Eigenvalues and Eigenvectors

  1. The eigenvalues \(\lambda_1, \lambda_2, …, \lambda_n\) of \(\textbf{A}\) are the real roots of the characteristic polynomial (of degree n) \(|\textbf{A}-\lambda \textbf{I}|=0\) . The roots are sometimes called latent, proper, or characteristic roots.

  2. The characteristic polynomials of \(\textbf{A}\) and \(\textbf{A}'\) are identical, so \(\textbf{A}\) and \(\textbf{A}'\) have the same eigenvalues. However, their eigenvectors are not identical.

  3. If \(\textbf{A}\) has eigenvalues \(\lambda_1, \lambda_2,...,\lambda_n\), then

    • The trace of \(\textbf{A}\) is the sum of its eigenvalues: \(tr(\textbf{A})=\sum_{i=1}^n\lambda_i\)

    • The determinant of \(\textbf{A}\) is the product of its eigenvalues: \(|\textbf{A}| = \prod_{i=1}^n\lambda_i\)

    • The rank of \(\textbf{A}\) is the number of non-zero eigenvalues: \(rk(\textbf{A})=\sum_{i=1}^h I(\lambda_i\neq0)\)

  4. If \(\textbf{A}\) is idempotent, then all eigenvalues of \(\textbf{A}\) are either 0 or 1.

  5. \(\textbf{A}\) is singular if and only if 0 is an eigenvalue of \(\textbf{A}\)

  6. If \(\textbf{A}\) is idempotent, then \(rk(\textbf{A})=tr(\textbf{A})\)

Decomposition of Matrices

  1. Spectral Decomposition

    Let \(\textbf{A}\) be an \(n \times n\) symmetric matrix. The matrix \(\textbf{A}\) can be decomposed as \(\textbf{P}\textbf{D}\textbf{P}'\), where \(\textbf{D}\) is a diagonal matrix with eigenvalues of \(\textbf{A}\) as its diagonal elements and \(\textbf{P}\) is an n-dimensional square matrix whose \(i^{th}\) column is the \(i^{th}\) eigenvector of \(\textbf{A}\). That is, \[ \textbf{A}=\sum_{i=1}^n\lambda_i\textbf{p}_i\textbf{p}_i'=\textbf{P}\textbf{D}\textbf{P}' \]

  2. Singular Value Decomposition

    For any \(n \times p\) matrix \(\textbf{X}\), it can be decomposed as \[ \textbf{X} = \textbf{U}\textbf{D}\textbf{V}' \]

    where

    • \(\textbf{U}\) is a (column) orthogonal \(n \times p\) matrix.

    • \(\textbf{D}\) is a diagonal matrix containing the singular values \(D_{ii}\) on the diagonal in decreasing order.

    • \(\textbf{V}\) is an orthogonal \(p \times p\) matrix.

    • \(\textbf{U}'\textbf{U}=\textbf{V}'\textbf{V}=\textbf{I}_p\)


Matrix Calculus

Let \(f(\textbf{x})\) be a continuous function of the elements of the vector \(\textbf{x}′ = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix}\) whose first and second partial derivatives \(\frac{\partial f(\textbf{x})}{\partial x_i}\), \(\frac{\partial^2 f(\textbf{x})}{\partial x_i \partial x_j}\) exists for all point \(\textbf{x}\) in some region of p-dimensional Euclidian Space.

  • Derivative of \(f(\textbf{x})\) with respect to \(\textbf{x}\): \[\nabla f(\textbf{x})=\frac{\partial f(\textbf{x})}{\partial{\textbf{x}}} = \left[\frac{\partial f(\textbf{x})}{\partial{x_i}} \right], \quad i=1,2,...,p\]

  • Hessian matrix of \(f(\textbf{x})\): \[ H_f = \frac{\partial^2 f(\textbf{x})}{\partial{\textbf{x}}\partial{\textbf{x}'}} = \left[\frac{\partial f(\textbf{x})}{\partial{x_i}\partial{x_j}} \right], \quad i=1,2,...,p \]

Some Results in Matrix Calculus

  1. If \(f(\textbf{x})=c\), then \(\frac{\partial f(\textbf{x})}{\partial \textbf{x}}=\textbf{0}\)

  2. If \(f(\textbf{x}) = \textbf{a}′\textbf{x}\), where a is a \(p \times 1\) vector of constants, then \(\frac{\partial f(\textbf{x})}{\partial \textbf{x}} = \textbf{a}\).

  3. If \(f(\textbf{x}) = \textbf{x}′\textbf{Ax}\), where \(\textbf{A}\) is symmetric, then \(\frac{\partial f (\textbf{x})}{\partial \textbf{x}} = 2\textbf{Ax}\).

  4. For general quadratic form \(f (\textbf{x}) = (\textbf{a} ± \textbf{Bx})'\textbf{A}(\textbf{a} ± \textbf{Bx})\), where a \(m \times 1\) vector of constants, \(\textbf{B}_{m \times p}\) is a matrix of constants, \(\textbf{x}_{p \times 1}\) is a vector of variables, and \(\textbf{A}_{m \times m}\) is a symmetric matrix of constants \[ \frac{\partial f (\textbf{x})}{\partial \textbf{x}} = ±2\textbf{B}'\textbf{A}(\textbf{a} ± \textbf{Bx}). \]


R outputs

Defining Matrices \(\textbf{A}\) and \(\textbf{B}\)

A <- matrix(c(1,2,2,3,5,8,1,3,6), ncol = 3)
B <- matrix(c(1,3,9,5,7,4,8,0,2), ncol = 3)
A ; B
##      [,1] [,2] [,3]
## [1,]    1    3    1
## [2,]    2    5    3
## [3,]    2    8    6
##      [,1] [,2] [,3]
## [1,]    1    5    8
## [2,]    3    7    0
## [3,]    9    4    2

Sum of two conformable matrices

A+B
##      [,1] [,2] [,3]
## [1,]    2    8    9
## [2,]    5   12    3
## [3,]   11   12    8

Difference of two conformable matrices

A-B
##      [,1] [,2] [,3]
## [1,]    0   -2   -7
## [2,]   -1   -2    3
## [3,]   -7    4    4

Matrix multiplication

A%*%B
##      [,1] [,2] [,3]
## [1,]   19   30   10
## [2,]   44   57   22
## [3,]   80   90   28

Transpose of a matrix: \(\textbf{A}'\)

t(A)
##      [,1] [,2] [,3]
## [1,]    1    2    2
## [2,]    3    5    8
## [3,]    1    3    6

Inner product of two matrices with same dimensions: \(\textbf{A}'\textbf{B}\)

t(A)%*%B
##      [,1] [,2] [,3]
## [1,]   25   27   12
## [2,]   90   82   40
## [3,]   64   50   20

Inverse of a matrix: \(A^{-1}\)

solve(A)
##      [,1]       [,2]       [,3]
## [1,]   -1  1.6666667 -0.6666667
## [2,]    1 -0.6666667  0.1666667
## [3,]   -1  0.3333333  0.1666667

Determinant of a matrix

det(A)
## [1] -6

0.2 Review of Statistical Inference

WHAT IS STATISTICAL INFERENCE?

Statistical Inference is an area in statistics that deals with the methods used to make generalizations or inference about some characteristics of the population based on information contained in the sample.

Approaches to inference

  • Estimation: estimate the value of the parameter of interest.
    • Point Estimation: calculate a single number as our guess to the unknown parameter.
    • Confidence Interval Estimation: create an interval which we hope contains the unknown parameter with a specific level of confidence.
  • Hypothesis Testing: make decisions on whether or not the sample agrees with the researcher’s assertion regarding some characteristic of the population
    • Parametric test: test hypotheses concerning the specific distributional characteristics (parameter) of the population.
    • Non-parametric test: make inferences about population without assuming a specific distribution

Example:

Objective: “How effective is Minoxidil in treating male pattern baldness?”

Specific Objectives:

  1. (point estimation) to estimate the population proportion of patients who will show new hair growth after being treated with Minoxidil
  2. (hypothesis testing) to determine whether treatment using Minoxidil is better than the existing treatment that is known to stimulate hair growth among 40% of patients with male pattern baldness

Basic Definitions

  • Let the random variable \(Y\) have a probability density function \(f(y)\) (or probability mass function \(p(y)\) for discrete). The expected value or mean of Y, denoted by \(\mu_Y\) or \(E(Y)\) is defined as

\[ E(Y)=\int_{-\infty} ^\infty y f(y)dy \quad E(Y) = \sum_{\forall y} y p(x) \]

  • The variance of a random variable \(Y\), denoted by \(\sigma^2_Y\) or \(Var(Y)\), is defined as \[ Var(Y) = E[(Y-\mu_Y)^2]=E(Y^2)-\mu_Y^2 \]

  • The covariance of Y and Z, denoted by \(Cov(Y,Z)\), or \(\sigma_{Y,Z}\), is defined by \[ Cov(Y,Z) = E[(Y-\mu_Y)(Z-\mu_Z)]=E(YZ)-\mu_Y\mu_Z \]

  • Independence of two random variables. Let \(Y\sim f_Y\) and \(Z \sim f_Z\). The random variables \(Y\) and \(Z\) are said to be independent if and only if \[ f_{Y,Z}(y,z) = f_Y(y) f_Z(z) \] where \(f_{Y,Z}(y,z)\) is the joint probability function of \(Y\) and \(Z\)


The Normal Distribution

We say that a random variable \(Y\) follows the normal distribution denoted by \(Y\sim Normal(\mu,\sigma^2)\) if and only if the pdf of \(Y\) is given by:

\[ f_Y(y)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left\{-\frac{1}{2}\frac{(x-\mu)^2}{\sigma^2}\right\} \]

Remarks:

  • If \(Y\sim Normal(\mu,\sigma^2)\) then, \(E(Y)=\mu\), \(Var(Y)=\sigma^2\), \(m_Y(t)=\exp\{\mu t+\frac{1}{2}\sigma^2 t^2\}\)
  • The Normal Distribution provides a reasonably good description of the graph of the relative frequency distribution of several random variables.
  • A lot of procedures in inferential statistics assume that the population is normally distributed.
  • In Stat 136, one of the the most common assumptions is that errors are normally distributed, and expected to be 0.

 

Results in Sampling from the Normal Distribution

  1. (Sample Mean) Let \(X_1,X_2,...,X_n \overset{iid}{\sim} Normal(\mu,\sigma^2)\) \[ \bar{X}=\frac{1}{n}\sum_{i=1}^n X_i\sim Normal\left(\mu,\frac{\sigma^2}{n}\right) \]

  2. (Sum of Squares of Standard Normal) Let \(Y_1,...,Y_n \overset{iid}{\sim} N(0,1)\). Then \[ \sum_{i=1}^n Y_i^2 \sim \chi^2_{(\nu=n)} \]

  3. (Sample Variance). Let \(S^2\) be the sample variance of \(X_1,X_2,...,X_n\overset{iid}{\sim}N(\mu,\sigma^2)\). Then \[ \frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{(\nu = n-1)} \]

  4. (T Statistic). Let \(Y\sim N(0,1), Z\sim \chi^2_\nu\), \(Y\) and \(Z\) are independent. Then \[ T = \frac{Y}{\sqrt{Z/\nu}} \sim t_{(\nu)} \] Remark: This will be useful in showing that \(\frac{\bar{X}-\mu}{S/\sqrt{n}}\sim t_{(n-1)}\) and \(\frac{\hat{\beta}_j-\beta_j}{\widehat{s.e.(\hat{\beta}_j)}} \sim t_{(n-p)}\)

  5. (F-Statistic). Let \(U\sim\chi^2_{(\nu_1)}, V\sim\chi^2_{(\nu_2)}\), \(U\) and \(V\) are independent. Then \[ F = \frac{U/\nu_1}{V/\nu_2}\sim F(\nu_1,\nu_2) \] Remark: This will be useful in ANOVA outputs in regression.


Point Estimation

Point Estimation uses information in a sample to arrive at a single number that will serve as an estimate of the value of the target parameter. The following are important concepts in estimation:

  • A point estimator \(\hat{\theta}\) is unbiased if \(E (\hat{\theta}) = \theta\).
  • Let \(T_1,...,T_n\) be a sequence of estimators of \(\theta\), where \(T_n\) is the same estimator based on a r.s. of size \(n\). The sequence \(\{T_n\}\) is said to be:
    • MSE-consistent iff \[ MSE_\theta(T_n)\rightarrow 0 \quad \text{as} \quad n \rightarrow \infty,\quad \forall \theta \in \Omega \]
    • weakly consistent iff: \[ P(|T_n-\theta|<\varepsilon) \rightarrow 1 \quad \text{as} \quad n \rightarrow \infty, \quad \forall \theta \in \Omega \]
  • A statistic \(T\) is a sufficient statistic if the conditional probability function of the sample observations, given \(T\) , does not depend on the parameter \(\theta\). (Also check: Factorization Criterion for Sufficiency)
  • A statistic \(T\) is said to be complete if and only if \(E(g(T)) = 0\) implies \(g(T)\) is almost surely equal to 0.
  • It is easy to find complete sufficient statistics when the distribution is a member of exponential family of distributions.
  • An estimator \(\hat{θ}\) is said to be uniformly minimum variance unbiased estimator (UMVUE) for \(\theta\) if for any other unbiased estimator \(\hat{\theta}'\), \(Var(\hat{\theta}) ≤ Var(\hat{\theta}'), \quad \forall \hat{\theta}'\).
  • One of the most popular way of finding the UMVUE is the Lehmann Scheffe Theorem. It states that any unbiased estimator of \(\theta\) which is a function of a complete sufficient statistic is said to be the UMVUE for \(\theta\).

Interval Estimation

Interval Estimation uses sample data to calculate the lower and upper bound of an interval such that the researcher can be highly confident that this interval contains the value of the target parameter.

  • We usually construct a (1 − α)100% confidence interval for the unknown parameter.
  • The confidence coefficient gives the coverage probability, i.e., the probability that the CI, before sampling, will enclose the true parameter value. Note, however, that once a sample has been observed, a CI ceases to be random and has probability of either 0 or 1 of trapping the true parameter value.
    • \((1-\alpha)\) is the probability that you will obtain a sample such that if you compute a \((1-\alpha)\) CI, it will capture the parameter, NOT the probability that the parameter is within a specified interval.
    • The interval is random, the parameter is not.
  • The most popular way in constructing CIs is the pivotal quantity method. In PQM, you manipulate a pivot, which is a random entity that contains the unknown parameter we are estimating whose distribution is independent of any unknown parameter.

Hypothesis Testing

Hypothesis testing uses sample data to evaluate the validity of a conjecture regarding unknown parameters.

  • The null hypothesis is the statement being testing; the conjecture the experimenter doubts to be true.

  • The alternative hypothesis is the operational statement of the theory that the experimenter believes to be true and wished to prove.

    Note: The null hypothesis and alternative hypothesis must be non-overlapping statements about the population.

  • The test statistic is a statistic computed from the sample data that is especially sensitive to the differences between the null and alternative hypotheses.

    Note: The test statistic should tend to take on certain values when Ho is true and tend to different values when Ha is true. The decision to reject Ho depends on the value of the test statistic

  • The region of rejection can be thought of as the set of values of the test of statistic that will lead to the rejection of the null hypothesis.

  • Errors in Hypothesis Testing:

    • Type I error: incorrectly rejecting the null when it is true.

    • Type II error: incorrectly accepting the null when it is false.

    • Since the Type I error is usually the more drastic of the two errors in hypothesis testing, it is a common approach to set an upper bound to the probability of committing a Type I error (\(\alpha\)), then find the test with the lowest probability of committing a Type II error (\(\beta\)).

    • Both probability of errors in hypothesis testing can be diminished by increasing the sample size.

  • The level of significance (\(\alpha\)) is the is the maximum probability of committing a Type I error the researcher is willing to commit.

  • The power of the test (\(1-\beta\)) is the probability of correctly rejecting the null hypothesis

  • The power function \(K_\phi(\theta)\) gives the probability of rejecting the null hypothesis based on a value of the parameter.

  • Likelihood Ratio Test

    • One of the most popular way in creating a test is the likelihood ratio test which depends on the test statistic \(\lambda\) given by \[ \lambda = \frac{\underset{\Omega_0}{\sup}\mathcal{L}(\theta,\textbf{X})}{\underset{\Omega}{\sup}\mathcal{L}(\theta,\textbf{X})} \]
    • The asymptotic distribution under \(Ho\) of the test statistic \(-2\ln(\lambda)\) is given by \(\chi^2_{(\nu)}\), where \(\nu\) is the number of unknown parameters on the parameter space minus the number of unknown parameters under the null hypothesis.

0.3 Measures of Correlation

What is a Correlation Coefficient?

  • Measures the degree of association of 2.

  • The value is usually between -1 and +1.

  • Any value close to +1 implies strong direct relationship while a value close to -1 implies strong inverse relationship. A value close to 0 implies weak or no relationship.

  • The coefficient does not imply structural relationship and does not indicate causality between the variables.

Although different, correlational analysis is oftentimes done as a preliminary step to explore data before doing regression analysis. There are many types of correlation coefficient. Few are summarized here.

Pearson’s \(r\)

The Pearson product-moment correlation coefficient measures the correlation between two continuous variables. This coefficient may underestimate the degree of association if relationship is nonlinear.

\[ r = \frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{(\sum_{i=1}^n(x_i-\bar{x})^2)(\sum_{i=1}^n(y_i-\bar{y})^2)}} \]

Example

A sample of 30 towns were drawn and mortality rate and calcium concentrating in drinking water were determined.

  • \(Y\) = 7-year mortality rate (per 100,000)
  • \(X\) = average calcium ion concentration in drinking water (ppm)
calcium <- read.csv("calcium.csv")
mortality calcium
1247 105
1800 14
1807 15
1359 84
1307 78
1555 39
1260 21
1742 8
1569 91
1772 15
1668 17
1609 18
1299 78
1392 73
1254 96
1428 39
1723 44
1547 9
1591 16
1828 8
1466 5
1558 10
1637 10
1755 12
1491 20
1318 122
1379 94
1096 138
1402 37
1704 26

cor.test(calcium$mortality, calcium$calcium, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  calcium$mortality and calcium$calcium
## t = -6.0649, df = 28, p-value = 1.537e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8759845 -0.5397840
## sample estimates:
##        cor 
## -0.7535183

Conclusion: Reject the hypothesis that there is no correlation.
The data indicates that there is a strong inverse relationship between mortality and calcium level in drinking water.


Spearman’s \(\rho\)

The Spearman Rank Correlation is a measure of correlation of rankings. The variables were both measured at least in an ordinal scale.
\[ \rho = \frac{6 \sum d_i^2}{n (n^2-1)} \] where \(d_i\) is the difference between ranks of two variables of observation \(i\)

Example

Ten materials for artificial reef were evaluated.

  • \(Y\) = ranking according to number of invertebrates attracted after 1 month.
  • \(X\) = ranking according to cost and availability of materials.
library(readr)
reef <- read_csv("reef.csv")
Material X Y
1 1 3
2 4 2
3 2 4
4 3 5
5 5 1
6 6 7
7 7 8
8 8 6
9 9 9
10 10 10
cor.test(reef$X, reef$Y, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  reef$X and reef$Y
## S = 38, p-value = 0.01367
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##      rho 
## 0.769697

Kendall’s \(\tau\)

The Kendall Rank correlation coefficient coefficient is also used to measure the ordinal association between two measured quantities.

\[\begin{align} \tau &= \frac{(\text{number of concordant pairs}-\text{number of discordant pairs})}{\text{number of pairs}} \\ &= 1-\frac{2(\text{number of discordant pairs})}{n\choose 2} \end{align}\]

Example

We use the sample example in the Spearman Correlation:

  • \(Y\) = ranking according to number of invertebrates attracted after 1 month.
  • \(X\) = ranking according to cost and availability of materials.

With respect to Material 6, there is only one material that is discordant to it, while the other eight are concordant to it.

cor.test(reef$X, reef$Y, method = "kendall")
## 
##  Kendall's rank correlation tau
## 
## data:  reef$X and reef$Y
## T = 36, p-value = 0.01667
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau 
## 0.6

The 3 correlation coefficients above are the most common coefficients, especially the Pearson’s r for numeric and continuous variables. The Kendall’s Tau and Spearman’s Rho cannot be used directly for continuous variables, unless they are converted to ranks.

The following coefficients measure association of categorical variables.

\(\phi\) Coefficient

The Phi Coefficient is used if both variables are dichotomous.

Y
X 0 1
0 a b
1 c d

\[ \phi=\frac{|ad-bc|}{\sqrt{(a+b)(c+d)(a+c)(b+d)}} \]


Contingency Coefficient

Used if variables are both categorical.

Y
X category 1 category 2 category 3
category 1 a d g
category 2 b e h
category 3 c f i

\[ C = \sqrt{\frac{\chi^2}{n+\chi^2}} \] where \(\chi^2\) is the chi-squared test statistic which can be computed using the contingency table.


Other measures of association if at least one is categorical

  • Biserial - one variable is continuous vs. another continuous variable which has been artificially dichotomized.
  • Point-biserial - one continuous vs. another which is a true dichotomy; conservative, and more applicable and safe to use when in doubt.
  • Tetrachoric - both variables are quantitative and both have been artificially dichotomized.
  • Eta-coefficient - one variable is interval and one is nominal.

© 2026 Siegfred Roi Codia. All rights reserved.