CHAPTER 1 Preliminaries

This chapter provides a review of the tools needed for regression analysis.

Concepts in matrix theory (Stat 135) are first presented then followed by important results in theory of parametric statistical inference (Stat 131).

R codes are also included for some results for visualization.

1.1 Review of Matrix Theory

WHY DO WE NEED MATRIX THEORY?

We are dealing with multiple variables. Matrices give us compact representation of equations for simplification of calculations, instead of the summation and other basic operations.


Basic Concepts

  • A matrix is an array of numbers (constants or variables) containing \(r\) rows and \(c\) columns \[ \underset{3\times4} {\textbf A }= \begin{bmatrix} 0 & 9 & 2 &3 \\ 7 & 6 & 4 &5 \\ 11 & 2 & 1 & 8 \\ \end{bmatrix} \]

  • The dimension or order is the size of the matrix, i.e. the number of rows and columns

  • A vector is an array of numbers (constants or variables) arranged in rows or columns \[ \underset{3\times1} {\textbf a }= \begin{bmatrix} 2 \\ 7 \\ 8 \end{bmatrix} \quad \underset{1\times3} {\textbf a' }= \begin{bmatrix} 2 & 7 & 8 \end{bmatrix} \]

  • A square matrix is a matrix that has equal number of rows and columns \[ \underset{n\times n}{\textbf{A}} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} \]

  • The diagonal elements are the elements found in the diagonal of a square matrix while those elements other than the diagonal elements are the off-diagonal or nondiagonal elements.

  • A diagonal matrix is a square matrix that has zero for all of its off-diagonal elements. \[ \begin{bmatrix} a_{11} & 0 & \cdots & 0 \\ 0 & a_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & a_{nn} \end{bmatrix} \]

  • A triangular matrix is a square matrix with all elements above (or below) the diagonal being zero. \[ \begin{bmatrix} a_{11} & 0 & \cdots & 0 \\ a_{21} & a_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix} \quad or \quad \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ 0 & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & a_{nn} \end{bmatrix} \]


Matrix Operations

  • Transpose of a matrix: \(\textbf{A}'\)
  • Trace of a matrix: \(tr(\textbf{A})=\sum_{i=1}^n a_{ii}\)
  • Addition of conformable matrices: \(\underset{p \times q}{\textbf{A}}+\underset{p \times q}{\textbf{B}}\)
  • Scalar Multiplication: \(c\textbf{A}=\{ca_{ij}\}\)
  • Multiplication of conformable matrices: \(\underset{p \times q}{\textbf{A}}\times \underset{q \times r}{\textbf{C}}=\underset{p \times r}{\textbf{D}}\)
  • Determinant of a matrix \(det(\textbf{A})=|\textbf{A}|\)

Special Matrices

  • Symmetric Matrices: If A is symmetric, then \(\textbf{A}=\textbf{A}'\)

  • Idempotent Matrix: \(\textbf{A}^2=\textbf{A}\)

  • Null Matrix: \(\textbf{0}=\begin{bmatrix} 0 & 0 & \cdots & 0 \\0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 \end{bmatrix}\)

  • Identity matrix: \(\textbf{I}=\begin{bmatrix} 1 & 0 & \cdots & 0 \\0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \end{bmatrix}\)

  • J matrix (matrix of ones): \[ \textbf{J}_2=\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix},\quad \textbf{J}_{2.5}=\begin{bmatrix} 1 & 1 &1 &1 &1 \\ 1 & 1 & 1 & 1 & 1\end{bmatrix}, \quad \textbf{J}=\textbf1\textbf{1}' \]

  • Let \(\textbf{J}\) be a square matrix of ones of order \(n\). Then \[ \bar{\textbf{J}}=\frac{1}{n}\textbf{J}= \begin{bmatrix} 1/n & 1/n & \cdots & 1/n \\ 1/n & 1/n & \cdots & 1/n \\ \vdots & \vdots & \ddots & \vdots \\ 1/n & 1/n & \cdots & 1/n \end{bmatrix} \]

  • Positive semi-definite: A matrix \(\textbf{M}\) such that \(\textbf{x}^T\textbf{M}\textbf{x}\geq0 \quad \forall\textbf{x}\in\mathbb{R}^n\)


Invertibility and Singularity

  • An \(n \times n\) square matrix \(\textbf{A}\) is invertible if and only if \(|\textbf{A}|\neq 0\)

  • An \(n \times n\) square matrix \(\textbf{A}\) is singular if \(|\textbf{A}|= 0\) and nonsingular if \(|\textbf{A}|\neq 0\)

Results on Inverses

  1. If a matrix has an inverse, then the inverse is unique

  2. \((\textbf{A}^{-1})^{-1}=\textbf{A}\)

  3. \(\textbf{A}^{-1}\textbf{A}=\textbf{I}\)

  4. If \(\textbf{A}\) and \(\textbf{B}\) are nonsingular matrices and \(\textbf{AB}\) is defined, then \((\textbf{AB})^{-1}=\textbf{B}^{-1}\textbf{A}^{-1}\)

  5. The transpose of an invertible matrix is also invertible.

  6. A square matrix \(\textbf{A}\) is orthogonal if \(\textbf{A}'=\textbf{A}^{-1}\rightarrow \textbf{A}\textbf{A}'=\textbf{I}\)


Linear Dependence and Ranks

  • If \(\textbf{M}=[\textbf{m}_1, \textbf{m}_2, …,\textbf{m}_n]\) , \(\textbf{m}\)s are vectors of dimension \(n\times 1\), \(\textbf{m}\)s are linearly dependent if there exists constants \(c_1, c_2,…,c_n\), not all zero such that \(c_1\textbf{m}_1 + c_2\textbf{m}_2 +\cdots +c_n\textbf{m}_n = \textbf{0}\).

  • The rank of matrix \(\textbf{M}\) is defined to be the largest number of linearly independent rows (columns) of \(\textbf{M}\).

Some results on ranks

  1. \(rk(\textbf{A})=rk(\textbf{A}')\)

  2. If \(\textbf{A}\) is idempotent, \(rk(\textbf{I}-\textbf{A})=rk(\textbf{I})-rk(\textbf{A})=tr(\textbf{I}-\textbf{A})\)

  3. If two square matrices \(\textbf{A}\) and \(\textbf{B}\) , each of order \(n\), are nonsingular, then for any matrix \(\textbf{C}\) where multiplication with \(\textbf{A}\) and \(\textbf{B}\) are defined, the matrices \(\textbf{C}\), \(\textbf{AC}\), \(\textbf{CB}\) , and \(\textbf{ACB}\) all have the same rank.

  4. The rank of the product of two matrices \(\textbf{A}\) and \(\textbf{B}\) is at most equal to the smaller of the ranks of \(\textbf{A}\) and \(\textbf{B}\). \[ rk(\textbf{AB})\leq \min\{rk(\textbf{A}),rk(\textbf{B})\} \]

  5. Let \(\textbf{A}\) be a square matrix of order \(n\) . \(|\textbf{A}| = 0\) if and only if \(rk(\textbf{A}) < n\).

  6. Let \(\textbf{A}\) and \(\textbf{B}\) be both \(m \times n\) matrices with ranks \(r_1\) and \(r_2\) respectively. Then \(rk(\textbf{A} + \textbf{B}) ≤ r_1 + r_2\).


Eigenvalues and Eigenvectors

Let \(\textbf{A}\) be an \(n \times n\) matrix.

  • A scalar \(\lambda\) is an eigenvalue of \(\textbf{A}\) if there \(\exists\) a nonzero vector \(\textbf{x}\in \mathbb{R}^n\) such that \(\textbf{Ax}=\lambda\textbf{x}\).
  • Any \(\textbf{x}\neq\textbf{0}\) satisfying the above equation is called an eigenvector of \(\textbf{A}\) corresponding to eigenvalue \(\lambda\)

Remarks on Eigenvalues and Eigenvectors

  1. The eigenvalues \(\lambda_1, \lambda_2, …, \lambda_n\) of \(\textbf{A}\) are the real roots of the characteristic polynomial (of degree n) \(|\textbf{A}-\lambda \textbf{I}|=0\) . The roots are sometimes called latent, proper, or characteristic roots.

  2. \(\textbf{A}\) is singular if and only if 0 is an eigenvalue of \(\textbf{A}\)

  3. The characteristic polynomials of \(\textbf{A}\) and \(\textbf{A}'\) are identical, so \(\textbf{A}\) and \(\textbf{A}'\) have the same eigenvalues. However, their eigenvectors are not identical.

  4. If \(\textbf{A}\) has eigenvalues \(\lambda_1, \lambda_2,...,\lambda_n\), then

    \[ tr(\textbf{A})=\sum_{i=1}^n\lambda_i \quad \text{and} \quad |\textbf{A}| = \prod_{i=1}^n\lambda_i \]

Decomposition of Matrices

  1. Spectral Decomposition Let \(\textbf{A}\) be an \(n \times n\) symmetric matrix. The matrix \(\textbf{A}\) can be decomposed as \(\textbf{P}\textbf{D}\textbf{P}'\), where \(\textbf{D}\) is a diagonal matrix with eigenvalues of \(\textbf{A}\) as its diagonal elements and \(\textbf{P}\) is an n-dimensional square matrix whose \(i^{th}\) column is the \(i^{th}\) eigenvector of \(\textbf{A}\). That is, \[ \textbf{A}=\sum_{i=1}^n\lambda_i\textbf{p}_i\textbf{p}_i'=\textbf{P}\textbf{D}\textbf{P}' \]

  2. Singular Value Decomposition

    For any \(n \times p\) matrix \(\textbf{X}\), it can be decomposed as \[ \textbf{X} = \textbf{U}\textbf{D}\textbf{V}' \]

    where

    • \(\textbf{U}\) is a (column) orthogonal \(n \times p\) matrix.

    • \(\textbf{D}\) is a diagonal matrix containing the singular values \(D_{ii}\) on the diagonal in decreasing order.

    • \(\textbf{V}\) is an orthogonal \(p \times p\) matrix.

    • \(\textbf{U}'\textbf{U}=\textbf{V}'\textbf{V}=\textbf{I}_p\)


Matrix Calculus

Let \(f(\textbf{x})\) be a continuous function of the elements of the vector \(\textbf{x}′ = \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix}\) whose first and second partial derivatives \(\frac{\partial f(\textbf{x})}{\partial x_i}\), \(\frac{\partial^2 f(\textbf{x})}{\partial x_i \partial x_j}\) exists for all point \(\textbf{x}\) in some region of p-dimensional Euclidian Space.

  • Derivative of \(f(\textbf{x})\) with respect to \(\textbf{x}\): \[\nabla f(\textbf{x})=\frac{\partial f(\textbf{x})}{\partial{\textbf{x}}} = \left[\frac{\partial f(\textbf{x})}{\partial{x_i}} \right], \quad i=1,2,...,p\]

  • Hessian matrix of \(f(\textbf{x})\): \[ H_f = \frac{\partial^2 f(\textbf{x})}{\partial{\textbf{x}}\partial{\textbf{x}'}} = \left[\frac{\partial f(\textbf{x})}{\partial{x_i}\partial{x_j}} \right], \quad i=1,2,...,p \]

Some Results in Matrix Calculus

  1. If \(f(\textbf{x})=c\), then \(\frac{\partial f(\textbf{x})}{\partial \textbf{x}}=\textbf{0}\)

  2. If \(f(\textbf{x}) = \textbf{a}′\textbf{x}\), where a is a \(p \times 1\) vector of constants, then \(\frac{\partial f(\textbf{x})}{\partial \textbf{x}} = \textbf{a}\).

  3. If \(f(\textbf{x}) = \textbf{x}′\textbf{Ax}\), where \(\textbf{A}\) is symmetric, then \(\frac{\partial f (\textbf{x})}{\partial \textbf{x}} = 2\textbf{Ax}\).

  4. For general quadratic form \(f (\textbf{x}) = (\textbf{a} ± \textbf{Bx})'\textbf{A}(\textbf{a} ± \textbf{Bx})\), where a \(m \times 1\) vector of constants, \(\textbf{B}_{m \times p}\) is a matrix of constants, \(\textbf{x}_{p \times 1}\) is a vector of variables, and \(\textbf{A}_{m \times m}\) is a symmetric matrix of constants \[ \frac{\partial f (\textbf{x})}{\partial \textbf{x}} = ±2\textbf{B}'\textbf{A}(\textbf{a} ± \textbf{Bx}). \]


R outputs

Defining Matrices \(\textbf{A}\) and \(\textbf{B}\)

A <- matrix(c(1,2,2,3,5,8,1,3,6), ncol = 3)
B <- matrix(c(1,3,9,5,7,4,8,0,2), ncol = 3)
A ; B
##      [,1] [,2] [,3]
## [1,]    1    3    1
## [2,]    2    5    3
## [3,]    2    8    6
##      [,1] [,2] [,3]
## [1,]    1    5    8
## [2,]    3    7    0
## [3,]    9    4    2

Sum of two conformable matrices

A+B
##      [,1] [,2] [,3]
## [1,]    2    8    9
## [2,]    5   12    3
## [3,]   11   12    8

Difference of two conformable matrices

A-B
##      [,1] [,2] [,3]
## [1,]    0   -2   -7
## [2,]   -1   -2    3
## [3,]   -7    4    4

Matrix multiplication

A%*%B
##      [,1] [,2] [,3]
## [1,]   19   30   10
## [2,]   44   57   22
## [3,]   80   90   28

Transpose of a matrix: \(\textbf{A}'\)

t(A)
##      [,1] [,2] [,3]
## [1,]    1    2    2
## [2,]    3    5    8
## [3,]    1    3    6

Inner product of two matrices with same dimensions: \(\textbf{A}'\textbf{B}\)

t(A)%*%B
##      [,1] [,2] [,3]
## [1,]   25   27   12
## [2,]   90   82   40
## [3,]   64   50   20

Inverse of a matrix: \(A^{-1}\)

solve(A)
##      [,1]       [,2]       [,3]
## [1,]   -1  1.6666667 -0.6666667
## [2,]    1 -0.6666667  0.1666667
## [3,]   -1  0.3333333  0.1666667

Determinant of a matrix

det(A)
## [1] -6

1.2 Review of Statistical Inference

WHAT IS STATISTICAL INFERENCE?

Statistical Inference is an area in statistics that deals with the methods used to make generalizations or inference about some characteristics of the population based on information contained in the sample.

Approaches to inference

  • Estimation: estimate the value of the parameter of interest.
    • Point Estimation: calculate a single number as our guess to the unknown parameter.
    • Confidence Interval Estimation: create an interval which we hope contains the unknown parameter with a specific level of confidence.
  • Hypothesis Testing: make decisions on whether or not the sample agrees with the researcher’s assertion regarding some characteristic of the population
    • Parametric test: test hypotheses concerning the specific distributional characteristics (parameter) of the population.
    • Non-parametric test: make inferences about population without assuming a specific distribution

Example:

Objective: “How effective is Minoxidil in treating male pattern baldness?”

Specific Objectives:

  1. (point estimation) to estimate the population proportion of patients who will show new hair growth after being treated with Minoxidil
  2. (hypothesis testing) to determine whether treatment using Minoxidil is better than the existing treatment that is known to stimulate hair growth among 40% of patients with male pattern baldness

Basic Definitions

  • Let the random variable \(Y\) have a probability density function \(f(y)\) (or probability mass function \(p(y)\) for discrete). The expected value or mean of Y, denoted by \(\mu_Y\) or \(E(Y)\) is defined as

\[ E(Y)=\int_{-\infty} ^\infty y f(y)dy \quad E(Y) = \sum_{\forall y} y p(x) \]

  • The variance of a random variable \(Y\), denoted by \(\sigma^2_Y\) or \(Var(Y)\), is defined as \[ Var(Y) = E[(Y-\mu_Y)^2]=E(Y^2)-\mu_Y^2 \]

  • The covariance of Y and Z, denoted by \(Cov(Y,Z)\), or \(\sigma_{Y,Z}\), is defined by \[ Cov(Y,Z) = E[(Y-\mu_Y)(Z-\mu_Z)]=E(YZ)-\mu_Y\mu_Z \]

  • Independence of two random variables. Let \(Y\sim f_Y\) and \(Z \sim f_Z\). The random variables \(Y\) and \(Z\) are said to be independent if and only if \[ f_{Y,Z}(y,z) = f_Y(y) f_Z(z) \] where \(f_{Y,Z}(y,z)\) is the joint probability function of \(Y\) and \(Z\)

Random Vectors and Matrices

  • Expectation of a random vector. If \(\underset{n \times 1}{\textbf{y}}\) is a vector of random variables, then \[ E(\textbf{y})=E \begin{bmatrix} Y_1\\Y_2 \\ \vdots \\Y_n \end{bmatrix} = \begin{bmatrix} E(Y_1)\\ E(Y_2) \\ \vdots \\ E(Y_n) \end{bmatrix} \]

  • Expectation of a random matrix. Let \(\textbf{X}=\{X_{ij}\}\) be a matrix of random variables. Then \[ E(\textbf{X})=\{\mu_{ij}\} \]

    • Remarks: \(E(\textbf{AXB}\pm\textbf{F})=\textbf{A}E(\textbf{X})\textbf{B}\pm\textbf{F}\) where \(\textbf{A}\) , \(\textbf{B}\), and \(\textbf{F}\) are constant matrices.
  • Variance-Covariance Matrix. Let \(\textbf{y}_{n\times 1}\) be a vector of random variables, \(\textbf{y}'=\begin{bmatrix} Y_1 & Y_2 & \cdots &Y_n \end{bmatrix}\). The variance-covariance matrix of \(\textbf{y}\) is defined as \[\begin{align} Var(\textbf{y})&= E \{ (\textbf{y} - E(\textbf{y}))(\textbf{y} - E(\textbf{y}))' \}\\ &= E \left\{ \begin{bmatrix} Y_1-E(Y_1) \\ Y_2-E(Y_2) \\ \vdots \\ Y_n-E(Y_n) \end{bmatrix} \begin{bmatrix} Y_1-E(Y_1) & Y_2-E(Y_2) & \cdots & Y_n-E(Y_n) \end{bmatrix} \right\}\\ &= \{\sigma_{ij}\}=\boldsymbol{\Sigma} \end{align}\]

    where

    • the diagonal elements are the variances of \(Y_i\): \(\sigma_{ii}=\sigma_i=Var(Y_i)\)

    • the off-diagonal elements are the covariances of \(Y_i\) and \(Y_j\): \(\sigma_{ij}=Cov(Y_i,Y_j)\)

  • Correlation Matrix. Let \(\textbf{y}_{n\times 1}\) be a vector of random variables, \(\textbf{y}'=\begin{bmatrix} Y_1 & Y_2 & \cdots &Y_n \end{bmatrix}\), and let \(\sigma_i\) be the standard deviation of \(Y_i\). The correlation matrix of \(\textbf{y}\) denoted as \(\rho(\textbf{y})\) is defined as:

    \[\begin{align}\rho(\textbf{y})&= diag^{-1}\{\sigma_1 \quad \sigma_2 \quad \cdots\sigma_n\}\boldsymbol{\Sigma}diag^{-1}\{\sigma_1 \quad \sigma_2 \quad \cdots\sigma_n\}\\&= \begin{bmatrix} \frac{1}{\sigma_1} & 0 & \cdots & 0 \\ 0 & \frac{1}{\sigma_2} & \cdots &0 \\ \vdots & \vdots & \ddots &\vdots\\ 0 & 0 & \cdots & \frac{1}{\sigma_n} \end{bmatrix} \begin{bmatrix} \sigma_1^2 & \sigma_{12} & \cdots & \sigma_{1n} \\ \sigma_{21} & \sigma_2^2 & \cdots & \sigma_{2n} \\ \vdots & \vdots & \ddots &\vdots\\ \sigma_{n1} & \sigma_{n2} & \cdots & \sigma_n^2 \end{bmatrix} \begin{bmatrix} \frac{1}{\sigma_1} & 0 & \cdots & 0 \\ 0 & \frac{1}{\sigma_2} & \cdots &0 \\ \vdots & \vdots & \ddots &\vdots\\ 0 & 0 & \cdots & \frac{1}{\sigma_n} \end{bmatrix}\\ &= \left\{\frac{\sigma_{ij}}{\sigma_i \sigma_j}\right\}, \quad i,j = 1,...,n\end{align}\]

    Remarks on variance and correlation:

    • The Variance-Covariance matrix and the Correlation Matrix are always symmetric.

    • The diagonal elements of the correlation matrix are always equal to 1.

    • Let \(\textbf{C}\) be a matrix of constants and \(\textbf{y}\) be a random vector where \(Var(\textbf{y})=\boldsymbol{\Sigma}\). Then \(Var(\textbf{Cy})=\textbf{C}\boldsymbol{\Sigma}\textbf{C}'\)

  • Quadratic Form. Let \(\textbf{y}\) be a vector of \(n\) random variables with mean \(\boldsymbol{\mu}\) and \(Var(\textbf{y})=\boldsymbol{\Sigma}\) and let \(\textbf{A}\) be an \(n\)-dimensional symmetric matrix. The scalar quantity \(\textbf{y}'\textbf{A}\textbf{x}\) is known as quadratic form in \(\textbf{y}\).

    Remarks:

    • \(E(\textbf{y}'\textbf{A}\textbf{y}) = tr(\textbf{A}\boldsymbol{\Sigma})+\boldsymbol{\mu}'\textbf{A}\boldsymbol{\mu}\)

    • Under Multivariate Normality, \(Var(\textbf{y}'\textbf{A}\textbf{y})=2tr(\textbf{A}\boldsymbol{\Sigma})^2+4\boldsymbol{\mu}'\textbf{A}\boldsymbol{\Sigma}\textbf{A}\boldsymbol{\mu}\)


The Normal Distribution

We say that a random variable \(Y\) follows the normal distribution denoted by \(Y\sim Normal(\mu,\sigma^2)\) if and only if the pdf of \(Y\) is given by:

\[ f_Y(y)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left\{-\frac{1}{2}\frac{(x-\mu)^2}{\sigma^2}\right\} \]

Remarks:

  • If \(Y\sim Normal(\mu,\sigma^2)\) then, \(E(Y)=\mu\), \(Var(Y)=\sigma^2\), \(m_Y(t)=\exp\{\mu t+\frac{1}{2}\sigma^2 t^2\}\)
  • The Normal Distribution provides a reasonably good description of the graph of the relative frequency distribution of several random variables.
  • A lot of procedures in inferential statistics assume that the population is normally distributed.
  • In Stat 136, one of the the most common assumptions is that errors are normally distributed, and expected to be 0.

 

Results in Sampling from the Normal Distribution

  1. (Sample Mean) Let \(X_1,X_2,...,X_n \overset{iid}{\sim} Normal(\mu,\sigma^2)\) \[ \bar{X}=\frac{1}{n}\sum_{i=1}^n X_i\sim Normal\left(\mu,\frac{\sigma^2}{n}\right) \]

  2. (Sum of Squares of Standard Normal) Let \(Y_1,...,Y_n \overset{iid}{\sim} N(0,1)\). Then \[ \sum_{i=1}^n Y_i^2 \sim \chi^2_{(\nu=n)} \]

  3. (Sample Variance). Let \(S^2\) be the sample variance of \(X_1,X_2,...,X_n\overset{iid}{\sim}N(\mu,\sigma^2)\). Then \[ \frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{(\nu = n-1)} \]

  4. (T Statistic). Let \(Y\sim N(0,1), Z\sim \chi^2_\nu\), \(Y\) and \(Z\) are independent. Then \[ T = \frac{Y}{\sqrt{Z/\nu}} \sim t_{(\nu)} \]

Exercise: Let \(X_1,X_2,...,X_n \overset{iid}{\sim} N(\mu,\sigma^2)\). Show that \(\frac{\bar{X}-\mu}{S/\sqrt{n}} \sim t_{(n-1)}\)

Solution

Since \(\bar{X} \sim ~N(\mu,\sigma^2/n)\), then \(Y=\frac{\bar{X}-\mu}{\sqrt{\sigma^2/n}}\sim N(0,1)\)

Let \(Z= \frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{(\nu = n-1)}\)

Now, let \(T=\frac{Y}{\sqrt{Z/(n-1)}}\) . From Result 4, \(T\) follows the T distribution with \(\nu=n-1\) .

Simplifying \(T\), we get

\[\begin{align} T&=\frac{Y}{\sqrt{Z/(n-1)}}\\ &= \frac{Y}{\sqrt{\frac{(n-1)S^2}{\sigma^2}/(n-1)}}\\ &=\frac{(\bar{X}-\mu)/\sqrt{\sigma^2/n}}{\sqrt{\frac{(n-1)S^2}{\sigma^2}/(n-1)}}\\ &=\frac{(\bar{X}-\mu)}{\sqrt{\sigma^2/n}\sqrt{\frac{S^2}{\sigma^2}}}\\ &=\frac{(\bar{X}-\mu)}{S/\sqrt{n}}\sim t_{(n-1)} \quad \blacksquare \end{align}\]

  1. (F-Statistic). Let \(U\sim\chi^2_{(\nu_1)}, V\sim\chi^2_{(\nu_2)}\), \(U\) and \(V\) are independent. Then \[ F = \frac{U/\nu_1}{V/\nu_2}\sim F(\nu_1,\nu_2) \] Remark: This will be useful in ANOVA outputs in regression.

Approaches to Inference

Point Estimation uses information in a sample to arrive at a single number that will serve as an estimate of the value of the target parameter. The following are important concepts in estimation:

  • A point estimator \(\hat{\theta}\) is unbiased if \(E (\hat{\theta}) = \theta\).
  • Let \(T_1,...,T_n\) be a sequence of estimators of \(\theta\), where \(T_n\) is the same estimator based on a r.s. of size \(n\). The sequence \(\{T_n\}\) is said to be:
    • MSE-consistent iff \[ MSE_\theta(T_n)\rightarrow 0 \quad \text{as} \quad n \rightarrow \infty,\quad \forall \theta \in \Omega \]
    • weakly consistent iff: \[ P(|T_n-\theta|<\varepsilon) \rightarrow 1 \quad \text{as} \quad n \rightarrow \infty, \quad \forall \theta \in \Omega \]
  • A statistic \(T\) is a sufficient statistic if the conditional probability function of the sample observations, given \(T\) , does not depend on the parameter \(\theta\). (Also check: Factorization Criterion for Sufficiency)
  • A statistic \(T\) is said to be complete if and only if \(E(g(T)) = 0\) implies \(g(T)\) is almost surely equal to 0.
  • It is easy to find complete sufficient statistics when the distribution is a member of exponential family of distributions.
  • An estimator \(\hat{θ}\) is said to be uniformly minimum variance unbiased estimator (UMVUE) for \(\theta\) if for any other unbiased estimator \(\hat{\theta}'\), \(Var(\hat{\theta}) ≤ Var(\hat{\theta}'), \quad \forall \hat{\theta}'\).
  • One of the most popular way of finding the UMVUE is the Lehmann Scheffe Theorem. It states that any unbiased estimator of \(\theta\) which is a function of a complete sufficient statistic is said to be the UMVUE for \(\theta\).

Interval Estimation uses sample data to calculate the lower and upper bound of an interval such that the researcher can be highly confident that this interval contains the value of the target parameter.

  • We usually construct a (1 − α)100% confidence interval for the unknown parameter.
  • The confidence coefficient gives the coverage probability, i.e., the probability that the CI, before sampling, will enclose the true parameter value. Note, however, that once a sample has been observed, a CI ceases to be random and has probability of either 0 or 1 of trapping the true parameter value.
    • \((1-\alpha)\) is the probability that you will obtain a sample such that if you compute a \((1-\alpha)\) CI, it will capture the parameter, NOT the probability that the parameter is within a specified interval.
    • The interval is random, the parameter is not.
  • The most popular way in constructing CIs is the pivotal quantity method. In PQM, you manipulate a pivot, which is a random entity that contains the unknown parameter we are estimating whose distribution is independent of any unknown parameter.

Hypothesis testing uses sample data to evaluate the validity of a conjecture regarding unknown parameters.

  • The null hypothesis is the statement being testing; the conjecture the experimenter doubts to be true.

  • The alternative hypothesis is the operational statement of the theory that the experimenter believes to be true and wished to prove.

    Note: The null hypothesis and alternative hypothesis must be non-overlapping statements about the population.

  • The test statistic is a statistic computed from the sample data that is especially sensitive to the differences between the null and alternative hypotheses.

    Note: The test statistic should tend to take on certain values when Ho is true and tend to different values when Ha is true. The decision to reject Ho depends on the value of the test statistic

  • The region of rejection can be thought of as the set of values of the test of statistic that will lead to the rejection of the null hypothesis.

  • Errors in Hypothesis Testing:

    • Type I error: incorrectly rejecting the null when it is true.

    • Type II error: incorrectly accepting the null when it is false.

    • Since the Type I error is usually the more drastic of the two errors in hypothesis testing, it is a common approach to set an upper bound to the probability of committing a Type I error (\(\alpha\)), then find the test with the lowest probability of committing a Type II error (\(\beta\)).

    • Both probability of errors in hypothesis testing can be diminished by increasing the sample size.

  • The level of significance (\(\alpha\)) is the is the maximum probability of committing a Type I error the researcher is willing to commit.

  • The power of the test (\(1-\beta\)) is the probability of correctly rejecting the null hypothesis

  • The power function \(K_\phi(\theta)\) gives the probability of rejecting the null hypothesis based on a value of the parameter.

  • Likelihood Ratio Test

    • One of the most popular way in creating a test is the likelihood ratio test which depends on the test statistic \(\lambda\) given by \[ \lambda = \frac{\underset{\Omega_0}{\sup}\mathcal{L}(\theta,\textbf{X})}{\underset{\Omega}{\sup}\mathcal{L}(\theta,\textbf{X})} \]
    • The asymptotic distribution under \(Ho\) of the test statistic \(-2\ln(\lambda)\) is given by \(\chi^2_{(\nu)}\), where \(\nu\) is the number of unknown parameters on the parameter space minus the number of unknown parameters under the null hypothesis.

1.3 The Model Building Process

WHAT ARE MODELS?

Model is a set of assumptions that summarizes the structure of a system.

Types of Models

  • Deterministic Models: models that produce the same exact result for a particular set of input.

    Example: income for a day as a function of items sold.

  • Stochastic Models: models that describe the unpredictable variation of the outcomes of a random experiment.

    Example: Grade of Stat 136 students using their Stat 131 grades. Take note that Stat 136 grades may still vary due to other random factors.

In Statistics, we are focused on Stochastic Models.

General Classification of Stochastic Models

  • structural models - explain the variability of the variable of interest by using the variability of other variables.
  • nonstructural models - explain the variability of a variable using past values or observations.
  • ”black-box” models - models whose main concern is to simply predict values of the dependent variable using a set of independent variables. Its main characteristic is that the model itself has little to no interpretation

Purpose of Modelling

  • to understand the mechanism that generates the data

  • to predict the values of the dependent variable given the independent variables

  • to optimize the response indexed by the dependent variable

Types of Variables in a Regression Problem

  • dependent (regressand, endogenous, target, output, response variable) - whose variability is being studied or explained within the system.

  • independent (regressor, exogenous, feature, input, explanatory variable) - used to explain the behavior of the dependent variable. The variability of this variable is explained outside of the system.

    Examples

    1. Can we predict a selling price of a house from certain characteristics of the house? (Sen and Srivastava, Regression Analysis)

      dependent variable - price of the house
      independent variables - number of bedrooms, floor space, garage size, etc.

       

    2. Is a person’s brain size and body size predictive of his or her intelligence? (Willerman et al., 1991)

      dependent variable - IQ level
      independent variables - brain size based on MRI scans, height, and weight of a person.

       

    3. What are the variables that affect the total expenditure of Filipino households based on the Family Income and Expenditure Survey (PSA, 2012)?

      dependent variable - total annual expenditure of the households
      independent variables - total household income, whether the household is agricultural, total number of household members

       

    4. What are the determinants of a movie’s box-office performance? (Scott, 2019)

      dependent variable - box office figure
      independent variables - production budget, marketing budget, critical reception, genre of the movie

Types of Data

  • Time-series data - a set of observations on the values that a variable takes at different times (example: daily, weekly, monthly, quarterly, annually, etc.)

    Date Passengers
    1949-01-01 112
    1949-01-31 118
    1949-03-02 132
    1949-04-02 129
    1949-05-02 121
    1949-06-02 135
    1949-07-02 148
    1949-08-01 148
    1949-09-01 136
    1949-10-01 119
    1949-11-01 104
    1949-12-01 118
    1950-01-01 115
    1950-01-31 126
    1950-03-02 141
    1950-04-02 135
    1950-05-02 125
    1950-06-02 149
    1950-07-02 170
    1950-08-01 170
    1950-09-01 158
    1950-10-01 133
    1950-11-01 114
    1950-12-01 140
    1951-01-01 145
    1951-01-31 150
    1951-03-02 178
    1951-04-02 163
    1951-05-02 172
    1951-06-02 178
    1951-07-02 199
    1951-08-01 199
    1951-09-01 184
    1951-10-01 162
    1951-11-01 146
    1951-12-01 166
    1952-01-01 171
    1952-01-31 180
    1952-03-02 193
    1952-04-01 181
    1952-05-02 183
    1952-06-01 218
    1952-07-02 230
    1952-08-01 242
    1952-09-01 209
    1952-10-01 191
    1952-11-01 172
    1952-12-01 194
    1953-01-01 196
    1953-01-31 196
    1953-03-02 236
    1953-04-02 235
    1953-05-02 229
    1953-06-02 243
    1953-07-02 264
    1953-08-01 272
    1953-09-01 237
    1953-10-01 211
    1953-11-01 180
    1953-12-01 201
    1954-01-01 204
    1954-01-31 188
    1954-03-02 235
    1954-04-02 227
    1954-05-02 234
    1954-06-02 264
    1954-07-02 302
    1954-08-01 293
    1954-09-01 259
    1954-10-01 229
    1954-11-01 203
    1954-12-01 229
    1955-01-01 242
    1955-01-31 233
    1955-03-02 267
    1955-04-02 269
    1955-05-02 270
    1955-06-02 315
    1955-07-02 364
    1955-08-01 347
    1955-09-01 312
    1955-10-01 274
    1955-11-01 237
    1955-12-01 278
    1956-01-01 284
    1956-01-31 277
    1956-03-02 317
    1956-04-01 313
    1956-05-02 318
    1956-06-01 374
    1956-07-02 413
    1956-08-01 405
    1956-09-01 355
    1956-10-01 306
    1956-11-01 271
    1956-12-01 306
    1957-01-01 315
    1957-01-31 301
    1957-03-02 356
    1957-04-02 348
    1957-05-02 355
    1957-06-02 422
    1957-07-02 465
    1957-08-01 467
    1957-09-01 404
    1957-10-01 347
    1957-11-01 305
    1957-12-01 336
    1958-01-01 340
    1958-01-31 318
    1958-03-02 362
    1958-04-02 348
    1958-05-02 363
    1958-06-02 435
    1958-07-02 491
    1958-08-01 505
    1958-09-01 404
    1958-10-01 359
    1958-11-01 310
    1958-12-01 337
    1959-01-01 360
    1959-01-31 342
    1959-03-02 406
    1959-04-02 396
    1959-05-02 420
    1959-06-02 472
    1959-07-02 548
    1959-08-01 559
    1959-09-01 463
    1959-10-01 407
    1959-11-01 362
    1959-12-01 405
    1960-01-01 417
    1960-01-31 391
    1960-03-02 419
    1960-04-01 461
    1960-05-02 472
    1960-06-01 535
    1960-07-02 622
    1960-08-01 606
    1960-09-01 508
    1960-10-01 461
    1960-11-01 390
    1960-12-01 432

     

  • Cross-section data - data on one or more variables collected at the same period in time

    mpg cyl disp hp drat wt qsec vs am gear carb
    Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
    Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
    Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
    Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
    Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
    Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
    Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
    Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
    Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
    Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
    Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
    Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
    Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
    Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
    Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
    Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
    Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
    Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
    Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
    Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
    Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
    Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
    AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
    Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
    Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
    Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
    Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
    Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
    Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
    Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
    Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
    Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

 

  • Panel data - data on one or more variables collected at several time points and from several observations (or panel member)

    Remark: Time series and cross-sectional data can be thought of as special cases of panel data that are in one dimension only (one panel member or individual for the former, one time point for the latter).

In this course, we will focus on cross-sectional data. As early as now, try to find a cross-sectional dataset for your research project.

That is, find (or gather) a dataset with \(n\) observations and \(p\) variables.

Steps in the Model-Building Process

  1. Planning
    - define the problem
    - identify the dependent/independent variables
    - establish goals

  2. Development of the model
    - collect data
    - preliminary description/exploration of the data
    - specify the model
    - fit the model
    - validate assumptions
    - remedy to regression problems
    - obtain the best model

  3. Verification and Maintenance
    - check model adequacy
    - check sign of coefficient
    - check stability of parameters
    - check forecasting ability
    - update parameters

Here in Stat 136, we are focused on the theory behind step (2) of the Model-Building Process. You will just learn the other steps naturally as you go along the way in your BS Stat journey.


1.4 Measures of Correlation

CORRELATIONAL ANALYSIS vs REGRESSION ANALYSIS

Both correlational analysis and regression analysis are oftentimes used to describe the relationship of several variables. However, both methodologies are different.

  • In correlational analysis, your main goal is to simply describe the relationship of (usually) two variables.

  • Regression analysis, on the other hand, gives more information than the relationship of the variables: you will be able to create an equation that lets you examine the structure the governs the random phenomenon being studied, and predict values of one variable using another variables.

Although different, correlational analysis is oftentimes done as a preliminary step to explore data before doing regression analysis.

Correlation Coefficient

  • Measures the degree of association of 2 or more variables.
  • The coefficient does not imply structural relationship and does not indicate causality between the variables.
  • The value is usually between -1 and +1.
  • Any value close to +1 implies strong direct relationship while a value close to -1 implies strong inverse relationship. A value close to 0 implies weak or no relationship.
  • There are many types of correlation coefficient. 6 are summarized here.

Pearson’s \(r\)

The Pearson product-moment correlation coefficient measures the correlation between two continuous variables. This coefficient may underestimate the degree of association if relationship is nonlinear.

\[ r = \frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{(\sum_{i=1}^n(x_i-\bar{x})^2)(\sum_{i=1}^n(y_i-\bar{y})^2)}} \]

Example

A sample of 30 towns were drawn and mortality rate and calcium concentrating in drinking water were determined.

  • \(Y\) = 7-year mortality rate (per 100,000)
  • \(X\) = average calcium ion concentration in drinking water (ppm)
calcium <- read.csv("calcium.csv")
mortality calcium
1247 105
1800 14
1807 15
1359 84
1307 78
1555 39
1260 21
1742 8
1569 91
1772 15
1668 17
1609 18
1299 78
1392 73
1254 96
1428 39
1723 44
1547 9
1591 16
1828 8
1466 5
1558 10
1637 10
1755 12
1491 20
1318 122
1379 94
1096 138
1402 37
1704 26

cor.test(calcium$mortality, calcium$calcium, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  calcium$mortality and calcium$calcium
## t = -6.0649, df = 28, p-value = 1.537e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8759845 -0.5397840
## sample estimates:
##        cor 
## -0.7535183

Conclusion: Reject the hypothesis that there is no correlation.
The data indicates that there is a strong inverse relationship between mortality and calcium level in drinking water.


Spearman’s \(\rho\)

The Spearman Rank Correlation is a measure of correlation of rankings. The variables were both measured at least in an ordinal scale.
\[ \rho = \frac{6 \sum d_i^2}{n (n^2-1)} \] where \(d_i\) is the difference between ranks of two variables of observation \(i\)

Example

Ten materials for artificial reef were evaluated.

  • \(Y\) = ranking according to number of invertebrates attracted after 1 month.
  • \(X\) = ranking according to cost and availability of materials.
library(readr)
reef <- read_csv("reef.csv")
Material X Y
1 1 3
2 4 2
3 2 4
4 3 5
5 5 1
6 6 7
7 7 8
8 8 6
9 9 9
10 10 10
cor.test(reef$X, reef$Y, method = "spearman")
## 
##  Spearman's rank correlation rho
## 
## data:  reef$X and reef$Y
## S = 38, p-value = 0.01367
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##      rho 
## 0.769697

Kendall’s \(\tau\)

The Kendall Rank correlation coefficient coefficient is also used to measure the ordinal association between two measured quantities.

\[\begin{align} \tau &= \frac{(\text{number of concordant pairs}-\text{number of discordant pairs})}{\text{number of pairs}} \\ &= 1-\frac{2(\text{number of discordant pairs})}{n\choose 2} \end{align}\]

Example

We use the sample example in the Spearman Correlation:

  • \(Y\) = ranking according to number of invertebrates attracted after 1 month.
  • \(X\) = ranking according to cost and availability of materials.

With respect to Material 6, there is only one material that is discordant to it, while the other eight are concordant to it.

cor.test(reef$X, reef$Y, method = "kendall")
## 
##  Kendall's rank correlation tau
## 
## data:  reef$X and reef$Y
## T = 36, p-value = 0.01667
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau 
## 0.6

The 3 correlation coefficients above are the most common coefficients, especially the Pearson’s r for numeric and continuous variables. The Kendall’s Tau and Spearman’s Rho cannot be used directly for continuous variables, unless they are converted to ranks.

The following coefficients measure association of categorical variables.

\(\phi\) Coefficient

The Phi Coefficient is used if both variables are dichotomous.

Y
X 0 1
0 a b
1 c d

\[ \phi=\frac{|ad-bc|}{\sqrt{(a+b)(c+d)(a+c)(b+d)}} \]


Contingency Coefficient

Used if variables are both categorical.

Y
X category 1 category 2 category 3
category 1 a d g
category 2 b e h
category 3 c f i

\[ C = \sqrt{\frac{\chi^2}{n+\chi^2}} \] where \(\chi^2\) is the chi-squared test statistic which can be computed using the contingency table.


Other measures of association if at least one is categorical

  • Biserial - one variable is continuous vs. another continuous variable which has been artificially dichotomized.
  • Point-biserial - one continuous vs. another which is a true dichotomy; conservative, and more applicable and safe to use when in doubt.
  • Tetrachoric - both variables are quantitative and both have been artificially dichotomized.
  • Eta-coefficient - one variable is interval and one is nominal.

1.5 Overview of Regression

Regression analysis is a statistical tool which utilizes the relation between two or more quantitative variables so that one variable can be predicted from the other(s).

The main objective of the analysis is to extract structural relationships among variables within a system. It is of interest to examine the effects that some variables exert (or appear to exert) on other variable(s).

Linear regression is used for a special class of relationships, namely, those that can be described by straight lines. The term simple linear regression refers to the case wherein only two variables are involved; otherwise, it is known as multiple linear regression.

A useful way of beginning regression analysis is by drawing a graph of one variable against the other variable. This graph, called the scatter diagram, can serve both to suggest a relationship, and to demonstrate possible inadequacies of it. The graph indicates the general tendency by which one variable varies with changes in another variable. The scatter diagram is useful in the simple linear regression case.

house <- read_csv("house.csv")
## Rows: 25 Columns: 2
## ── Column specification ──────────
## Delimiter: ","
## dbl (2): PRICE, TAX
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
PRICE TAX
53 652
55 1000
56 897
58 964
64 1099
44 960
49 678
72 800
82 1038
85 1200
45 860
47 600
49 676
56 1287
60 834
62 734
64 551
66 1355
35 561
38 489
43 752
46 774
46 440
50 549
65 900
lm(PRICE~TAX, house) |> summary()
## 
## Call:
## lm(formula = PRICE ~ TAX, data = house)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.527  -7.723  -1.681   4.165  20.187 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 31.39144    7.43543   4.222 0.000324 ***
## TAX          0.02931    0.00864   3.392 0.002505 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.44 on 23 degrees of freedom
## Multiple R-squared:  0.3335, Adjusted R-squared:  0.3045 
## F-statistic: 11.51 on 1 and 23 DF,  p-value: 0.002505


© 2024 UP School of Statistics. All rights reserved.