Chapter 4 Miscellaneous Issues for Statistical Inference

This section provides information and explanation for various issues I have encountered while TAing and teaching statistical inference. If you see errors or have suggestions for adding new topics, please do let me know.

4.1 Variance and Covariance

There are two equivalent functions that you will (or have) encounter(ed) to calculate the variance and covariance matrix. For a slightly different treatment related to probability, PennState has a good introduction.

\[\begin{equation} cov(\mathbf{X},\mathbf{Y})=\mathbb{E} \left( \left(\mathbf{X}-\mathbb{E}(\mathbf{X})\right) \left(\mathbf{Y}-\mathbb{E}(\mathbf{Y})\right)^{T} \right) \tag{4.1} \end{equation}\]

\[\begin{equation} cov(\mathbf{X},\mathbf{Y})= \mathbb{E}(\mathbf{XY}^{T}) - \mathbb{E}(\mathbf{X})\mathbb{E}(\mathbf{Y})^{T} \tag{4.2} \end{equation}\]

Before we dig further, a couple of clarifications. First, covariance is a concept about vectors. In the above equations, \(\mathbf{X}\) and \(\mathbf{Y}\) are vectors. Second, in R we can calculate covariance for matrices using the cov function. Note that if we use cov(x,y) we are calculating the covariance between the columns of x and the colummns of y.

Now, let us first focus on (4.1). For sources and information, see here. Before we begin, note that the cov function in R uses the denominator of n-1 to give an unbiased estimator. Accordingly, we can (should) adjust our biased results (using the above functions) by n/(n-1).

# use the built-in R data frame mtcars
x <- mtcars$mpg
y <- mtcars$cyl
cov(x,y)
## [1] -9.172379

Using (4.1), we have

mean(((x - mean(x)) * (y-mean(y)))) *(length(x)/(length(x)-1)) 
## [1] -9.172379

Using (4.2), we have

(mean(x * y) - mean(x)*mean(y))*(length(x)/(length(x)-1)) 
## [1] -9.172379

Now, let us turn to matrix. Applying the cov function to a matrix gives us a variance covariance matrix. That is, R gives us the covraince of each column with the rest of the columns.

# use the built-in R data frame
X <- as.matrix(mtcars[, 1:3])
cov(X)
##              mpg        cyl       disp
## mpg    36.324103  -9.172379  -633.0972
## cyl    -9.172379   3.189516   199.6603
## disp -633.097208 199.660282 15360.7998

Let us calculate the variance covariance matrix by hand. Recalling (4.1), we need the expected value of matrix. That is, a matrix with column means (using the colMeans function). We need to make sure that this matrix has the same dimension as our original matrix. Note that we use n-1 to obtain the unbiased expected values.

n <- nrow(X)
Xbar <- matrix(colMeans(X), nrow=nrow(X), ncol=ncol(X), byrow=TRUE)
Xc <- X-Xbar
t(Xc) %*% Xc / (n-1)
##              mpg        cyl       disp
## mpg    36.324103  -9.172379  -633.0972
## cyl    -9.172379   3.189516   199.6603
## disp -633.097208 199.660282 15360.7998

This can be simplified with some matrix algebra.

n <- nrow(X)
C <- diag(n) - matrix(1/n, n, n)
Xc <- C %*% X
t(Xc) %*% Xc / (n-1)
##              mpg        cyl       disp
## mpg    36.324103  -9.172379  -633.0972
## cyl    -9.172379   3.189516   199.6603
## disp -633.097208 199.660282 15360.7998

or you can use the scale function.

Xc <- scale(X, center=TRUE, scale=FALSE)
t(Xc) %*% Xc / (n-1)
##              mpg        cyl       disp
## mpg    36.324103  -9.172379  -633.0972
## cyl    -9.172379   3.189516   199.6603
## disp -633.097208 199.660282 15360.7998

Let us use (4.2). This time we need the expected values of \(\mathbf{XY}^{T}\) and the expected values of \(\mathbf{X}\) times the expected values of \(\mathbf{Y}\).

n <- nrow(X)
EXY <- t(X) %*% X/(n)
EXEY  <- colMeans(X) %o% colMeans(X)
(EXY-EXEY)*(n/(n-1))
##              mpg        cyl       disp
## mpg    36.324103  -9.172379  -633.0972
## cyl    -9.172379   3.189516   199.6603
## disp -633.097208 199.660282 15360.7998

4.2 Degrees of Freedom

A nice blog to understand degrees of freedom by Jim Frost. The main takeaway, DF is " the number of values that are free to vary as you estimate parameters."

4.3 False Positive/Negative

Some may be confused by type I and type II errors. A number of clarifications. First, the null is considered negative. So false positive means we conclude that the results are positive (the null is wrong), but our conclusion is wrong (the null is indeed right). Similarlly, false negative means we falsely accept the null.

Second, false positive is also called type I error. This is typically the main error we are concerned about. A primary example is in the criminal justice system. The null is “the man is innocent.” Type I error means we convict an innocent person. Type II error means we let a criminal free. Typically, we are more concerned about making type I error because it may cause an innocent person his life.

Third, the two errors are interrelated. If we reduces the rate of making type I error, then we are more likely to make type II error (see here for a graphic example).

Finally, it is not always the case that type I error is the worse one. Think of a medical screening for a disease. False positive means we tell the patient they have a disease (which they don’t really have). Typically, this leads to more examination which most likely will reveal our errors. False negative, however, will falsely assume the patient they do not have a disease. In most cases, doctors will prefer false positives. Another example is the null hypothesis that “there is no tiger behind that bush.” False negative can cause you life (see here for an example of misunderstanding the errors). Also, check here for related topics such as sensitivity and specificity.

4.4 Interaction vs. Subgroup

Interaction terms can be viewed as a way of running separate regressions via subsetting the response variable.

mod1a <- lm(mpg~cyl+disp+hp, data=subset(mtcars,am==0))
mod1b <- lm(mpg~cyl+disp+hp, data=subset(mtcars,am==1))
coef(mod1a)
##  (Intercept)          cyl         disp           hp 
## 28.470571501 -0.556797933 -0.008137074 -0.031773289
coef(mod1b)
##  (Intercept)          cyl         disp           hp 
## 36.219391942 -1.258284249 -0.029303090 -0.009720254
mod2 <- lm(mpg~cyl*am+disp*am+hp*am, data=mtcars)
coef(mod2)
##  (Intercept)          cyl           am         disp           hp 
## 28.470571501 -0.556797933  7.748820441 -0.008137074 -0.031773289 
##       cyl:am      am:disp        am:hp 
## -0.701486315 -0.021166016  0.022053035

Note that the coefficient estimates are equivalent. That is, setting the interacted variable am to 0 (or 1) results in the same estimates of mod1a (or mod1b). Two additional notes: first, interaction results in more efficient estimates as we incorporate more information in a model; second, the variance for the marginal effect is also conditional on the variables. As such, if we rescale a variable, we can make the effects significant.

For people who are interested, Ferland offers this explanation on alternative specification of interaction models.

4.5 How to Understand QQPLOT

A key to understand QQ plot is where does the theoretical quantile come from. A good explanation can be found here.

4.6 How to Interpret log transformation

Here is a simple and illustration explanation. Here are some explanations from various perspectives.

4.7 When to standardize variables?

Here is a good explanation for when and why do we want to standardize our variables. Note that Gelman recommend standadization by two sd. For a link of paper and R functions, see here.