## 2.4 Inference for model parameters

The assumptions introduced in the previous section allow us to specify what is the distribution of the *random vector* \(\hat{\boldsymbol{\beta}}.\) The distribution is derived conditionally on the predictors’ sample \(\mathbf{X}_1,\ldots,\mathbf{X}_n.\) In other words, we assume that the randomness of \(\mathbf{Y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}\) comes only from the error terms and not from the predictors^{24}. To denote this, we employ lowercase for the predictors’ sample \(\mathbf{x}_1,\ldots,\mathbf{x}_n.\)

### 2.4.1 Distributions of the fitted coefficients

The distribution of \(\hat{\boldsymbol{\beta}}\) is:

\[\begin{align} \hat{\boldsymbol{\beta}}\sim\mathcal{N}_{p+1}\left(\boldsymbol{\beta},\sigma^2(\mathbf{X}'\mathbf{X})^{-1}\right). \tag{2.11} \end{align}\]

This result can be obtained from the form of \(\hat{\boldsymbol{\beta}}\) given in (2.7), the sample version of the model assumptions given in (2.10), and the linear transformation property of a normal given in (1.4). Equation (2.11) implies that the marginal distribution of \(\hat\beta_j\) is

\[\begin{align} \hat{\beta}_j\sim\mathcal{N}\left(\beta_j,\mathrm{SE}(\hat\beta_j)^2\right),\tag{2.12} \end{align}\]

where \(\mathrm{SE}(\hat\beta_j)\) is the *standard error*, \(\mathrm{SE}(\hat\beta_j)^2:=\sigma^2v_j,\) and

\[\begin{align*} v_j\text{ is the }j\text{-th element of the diagonal of }(\mathbf{X}'\mathbf{X})^{-1}. \end{align*}\]

Recall that an equivalent form for (2.12) is (why?)

\[\begin{align*} \frac{\hat\beta_j-\beta_j}{\mathrm{SE}(\hat\beta_j)}\sim\mathcal{N}(0,1). \end{align*}\]

The interpretation of (2.12) is simpler in the case with \(p=1,\) where

\[\begin{align} \hat\beta_0\sim\mathcal{N}\left(\beta_0,\mathrm{SE}(\hat\beta_0)^2\right),\quad\hat\beta_1\sim\mathcal{N}\left(\beta_1,\mathrm{SE}(\hat\beta_1)^2\right),\tag{2.13} \end{align}\]

with

\[\begin{align} \mathrm{SE}(\hat\beta_0)^2=\frac{\sigma^2}{n}\left[1+\frac{\bar X^2}{s_x^2}\right],\quad \mathrm{SE}(\hat\beta_1)^2=\frac{\sigma^2}{ns_x^2}.\tag{2.14} \end{align}\]

Some insights on (2.13) and (2.14), illustrated interactively in Figure 2.13, are the following:

**Bias**. Both estimates are unbiased. That means that their expectations are the true coefficients for any sample size \(n.\)**Variance**. The variances \(\mathrm{SE}(\hat\beta_0)^2\) and \(\mathrm{SE}(\hat\beta_1)^2\) have interesting interpretations in terms of their components:*Sample size \(n\)*. As the sample size grows, the precision of the estimators increases, since both variances decrease.*Error variance \(\sigma^2\)*. The more disperse the error is, the less precise the estimates are, since more vertical variability is present.*Predictor variance \(s_x^2\)*. If the predictor is spread out (large \(s_x^2\)), then it is easier to fit a regression line: we have information about the data trend over a long interval. If \(s_x^2\) is small, then all the data is concentrated on a narrow vertical band, so we have a much more limited view of the trend.*Mean \(\bar X\)*. It has influence only on the precision of \(\hat\beta_0.\) The larger \(\bar X\) is, the less precise \(\hat\beta_0\) is.

The insights about (2.11) are more convoluted. The following broad remarks, extensions of what happened when \(p=1,\) apply:

**Bias**. All the estimates are unbiased for any sample size \(n.\)**Variance**. It depends on:*Sample size \(n\)*. Hidden inside \(\mathbf{X}'\mathbf{X}.\) As \(n\) grows, the precision of the estimators increases.*Error variance \(\sigma^2\)*. The larger \(\sigma^2\) is, the less precise \(\hat{\boldsymbol{\beta}}\) is.*Predictor sparsity \((\mathbf{X}'\mathbf{X})^{-1}\)*. The more “disperse”^{25}the predictors are, the more precise \(\hat{\boldsymbol{\beta}}\) is.

The problem with the result in (2.11) is that *\(\sigma^2\) is unknown* in practice. Therefore, we need to estimate \(\sigma^2\) in order to use a result similar to (2.11). We do so by computing a rescaled sample variance of the residuals \(\hat\varepsilon_1,\ldots,\hat\varepsilon_n\):

\[\begin{align} \hat\sigma^2:=\frac{1}{n-p-1}\sum_{i=1}^n\hat\varepsilon_i^2.\tag{2.15} \end{align}\]

Note the \(n-p-1\) in the denominator. The factor \(n-p-1\) represents the *degrees of freedom*: the number of data points minus the number of *already*^{26} fitted parameters (\(p\) slopes plus \(1\) intercept) with the data. For the interpretation of \(\hat\sigma^2,\) it is key to realize that *the mean of the residuals \(\hat\varepsilon_1,\ldots,\hat\varepsilon_n\) is zero*, this is \(\bar{\hat\varepsilon}=0.\) Therefore, \(\hat\sigma^2\) is indeed a rescaled sample variance of the residuals which estimates the variance of \(\varepsilon\;\)^{27}. It can be seen that \(\hat\sigma^2\) is unbiased as an estimator of \(\sigma^2.\)

If we use the estimate \(\hat\sigma^2\) instead of \(\sigma^2,\) we get more useful^{28} distributions than (2.12):

\[\begin{align} \frac{\hat\beta_j-\beta_j}{\hat{\mathrm{SE}}(\hat\beta_j)}\sim t_{n-p-1},\quad\hat{\mathrm{SE}}(\hat\beta_j)^2:=\hat\sigma^2v_j,\tag{2.16} \end{align}\]

where \(t_{n-p-1}\) represents the *Student’s \(t\) distribution* with \(n-p-1\) degrees of freedom.

The LHS of (2.16) is the *\(t\)-statistic* for \(\beta_j,\) \(j=0,\ldots,p.\) We will employ them for building confidence intervals and hypothesis tests in what follows.

### 2.4.2 Confidence intervals for the coefficients

Thanks to (2.16), we can have the \(100(1-\alpha)\%\) Confidence Intervals (CI) for the coefficient \(\beta_j,\) \(j=0,\ldots,p\):

\[\begin{align} \left(\hat\beta_j\pm\hat{\mathrm{SE}}(\hat\beta_j)t_{n-p-1;\alpha/2}\right)\tag{2.17} \end{align}\]

where \(t_{n-p-1;\alpha/2}\) is the *\(\alpha/2\)-upper quantile of the \(t_{n-p-1}\)*. Usually, \(\alpha=0.10,0.05,0.01\) are considered.

This *random* CI *contains the unknown coefficient \(\beta_j\) “with a probability of \(1-\alpha\)”*. The previous quoted statement has to be understood as follows. Suppose you have 100 samples generated according to a linear model. If you compute the CI for a coefficient, then in approximately \(100(1-\alpha)\) of the samples the true coefficient would be actually inside the random CI. Note also that the CI is symmetric around \(\hat\beta_j.\) This is illustrated in Figure 2.14.

### 2.4.3 Testing on the coefficients

The distributions in (2.16) allow also to conduct a formal *hypothesis test* on the coefficients \(\beta_j,\) \(j=0,\ldots,p.\) For example the test for *significance*^{29} is especially important, that is, the test of the hypotheses

\[\begin{align*} H_0:\beta_j=0 \end{align*}\]

for \(j=0,\ldots,p.\) The test of \(H_0:\beta_j=0\) with \(1\leq j\leq p\) is especially interesting, since it allows us to answer whether *the variable \(X_j\) has a significant linear effect on \(Y\)*. The statistic used for testing for significance is the \(t\)-statistic

\[\begin{align*} \frac{\hat\beta_j-0}{\hat{\mathrm{SE}}(\hat\beta_j)}, \end{align*}\]

which is distributed as a \(t_{n-p-1}\) *under the (veracity of) the null hypothesis*^{30}.

The null hypothesis \(H_0\) is tested *against* the *alternative hypothesis*, \(H_1.\) If \(H_0\) is rejected, it is *rejected in favor* of \(H_1.\) The alternative hypothesis can be *two-sided* (we will focus mostly on these alternatives), such as

\[\begin{align*} H_0:\beta_j= 0\quad\text{vs.}\quad H_1:\beta_j\neq 0 \end{align*}\]

or *one-sided*, such as

\[\begin{align*} H_0:\beta_j=0 \quad\text{vs.}\quad H_1:\beta_j<(>)0. \end{align*}\]

The test based on the \(t\)-statistic is referred to as the *\(t\)-test*. It rejects \(H_0:\beta_j=0\) (against \(H_1:\beta_j\neq 0\)) at significance level \(\alpha\) for large *absolute* values of the \(t\)-statistic, precisely for those above the \(\alpha/2\)-upper quantile of the \(t_{n-p-1}\) distribution. That is, it rejects \(H_0\) at level \(\alpha\) if \(\frac{|\hat\beta_j|}{\hat{\mathrm{SE}}(\hat\beta_j)}>t_{n-p-1;\alpha/2}\;\)^{31}. For the one-sided tests, it rejects \(H_0\) against \(H_1:\beta_j<0\) or \(H_1:\beta_j>0\) if \(\frac{\hat\beta_j}{\hat{\mathrm{SE}}(\hat\beta_j)}<-t_{n-p-1;\alpha}\) or \(\frac{\hat\beta_j}{\hat{\mathrm{SE}}(\hat\beta_j)}>t_{n-p-1;\alpha},\) respectively.

Remember the following insights about hypothesis testing.

**trial**can be seen in Appendix A.1.

In an hypothesis test, the *\(p\)-value measures the degree of veracity of \(H_0\) according to the data*. The rule of thumb is the following:

*Is the \(p\)-value lower than \(\alpha\)?*

**Yes**\(\rightarrow\)**reject \(H_0\)**.**No**\(\rightarrow\)**do not reject \(H_0\)**.

The connection of a \(t\)-test for \(H_0:\beta_j=0\) and the CI for \(\beta_j,\) both at level \(\alpha,\) is the following:

*Is \(0\) inside the CI for \(\beta_j\)?*

**Yes**\(\leftrightarrow\)**do not reject \(H_0\)**.**No**\(\leftrightarrow\)**reject \(H_0\)**.

The one-sided test \(H_0:\beta_j=0\) vs. \(H_1:\beta_j<0\) (respectively, \(H_1:\beta_j>0\)) can be done by means of the CI for \(\beta_j.\) If \(H_0\) is rejected, they allow us to conclude that *\(\hat\beta_j\) is significantly negative (positive)* and that *for the considered regression model, \(X_j\) has a significant negative (positive) effect on \(Y\)*. The rule of thumb is the following:

*Is the CI for \(\beta_j\) below (above) \(0\) at level \(\alpha\)?*

**Yes**\(\rightarrow\)**reject \(H_0\)**at level \(\alpha.\) Conclude \(X_j\) has a significant negative (positive) effect on \(Y\) at level \(\alpha.\)**No**\(\rightarrow\) the criterion is**not conclusive**.

### 2.4.4 Case study application

Let’s analyze the multiple linear model we have considered for the `wine`

dataset, now that we know how to make inference on the model parameters. The relevant information is obtained with the `summary`

of the model:

```
# Fit
modWine1 <- lm(Price ~ ., data = wine)
# Summary
sumModWine1 <- summary(modWine1)
sumModWine1
##
## Call:
## lm(formula = Price ~ ., data = wine)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.46541 -0.24133 0.00413 0.18974 0.52495
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.343e+00 7.697e+00 -0.304 0.76384
## WinterRain 1.153e-03 4.991e-04 2.311 0.03109 *
## AGST 6.144e-01 9.799e-02 6.270 3.22e-06 ***
## HarvestRain -3.837e-03 8.366e-04 -4.587 0.00016 ***
## Age 1.377e-02 5.821e-02 0.237 0.81531
## FrancePop -2.213e-05 1.268e-04 -0.175 0.86313
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.293 on 21 degrees of freedom
## Multiple R-squared: 0.8278, Adjusted R-squared: 0.7868
## F-statistic: 20.19 on 5 and 21 DF, p-value: 2.232e-07
# Contains the estimation of sigma ("Residual standard error")
sumModWine1$sigma
## [1] 0.2930287
# Which is the same as
sqrt(sum(modWine1$residuals^2) / modWine1$df.residual)
## [1] 0.2930287
```

The `Coefficients`

block of the `summary`

output contains the next elements regarding the significance of each coefficient \(\beta_j,\) this is, the test \(H_0:\beta_j=0\) vs. \(H_1:\beta_j\neq0\):

`Estimate`

: least squares estimate \(\hat\beta_j.\)`Std. Error`

: estimated standard error \(\hat{\mathrm{SE}}(\hat\beta_j).\)`t value`

: \(t\)-statistic \(\frac{\hat\beta_j}{\hat{\mathrm{SE}}(\hat\beta_j)}.\)`Pr(>|t|)`

: \(p\)-value of the \(t\)-test.`Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1`

: codes indicating the size of the \(p\)-value. The more asterisks, the more evidence supporting that \(H_0\) does not hold^{32}.

Note that **a high proportion of predictors are not significant** in `modWine1`

: `FrancePop`

and `Age`

are not significant (and the intercept is not significant also). This is an indication of an **excess of predictors** adding little information to the response. One explanation is the almost perfect correlation between `FrancePop`

and `Age`

shown before: one of them is not adding any extra information to explain `Price`

. This complicates the model unnecessarily and, more importantly, it has the undesirable effect of making the **coefficient estimates less precise**. We opt to remove the predictor `FrancePop`

from the model since it is exogenous to the wine context^{33}. A data-driven justification of the removal of this variable is that it is the least significant in `modWine1`

.

Then, the model without `FrancePop`

^{34} is:

```
modWine2 <- lm(Price ~ . - FrancePop, data = wine)
summary(modWine2)
##
## Call:
## lm(formula = Price ~ . - FrancePop, data = wine)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.46024 -0.23862 0.01347 0.18601 0.53443
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.6515703 1.6880876 -2.163 0.04167 *
## WinterRain 0.0011667 0.0004820 2.420 0.02421 *
## AGST 0.6163916 0.0951747 6.476 1.63e-06 ***
## HarvestRain -0.0038606 0.0008075 -4.781 8.97e-05 ***
## Age 0.0238480 0.0071667 3.328 0.00305 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2865 on 22 degrees of freedom
## Multiple R-squared: 0.8275, Adjusted R-squared: 0.7962
## F-statistic: 26.39 on 4 and 22 DF, p-value: 4.057e-08
```

All the coefficients are significant at level \(\alpha=0.05.\) Therefore, there is no clear redundant information. In addition, the \(R^2\) is very similar to the full model, but the `'Adjusted R-squared'`

, a weighting of the \(R^2\) to account for the number of predictors used by the model, is slightly larger. As we will see in Section 2.7.2, this means that, compared to the number of predictors used, `modWine2`

explains more variability of `Price`

than `modWine1`

.

A handy way of comparing the coefficients of both models is `car::compareCoefs`

:

```
car::compareCoefs(modWine1, modWine2)
## Calls:
## 1: lm(formula = Price ~ ., data = wine)
## 2: lm(formula = Price ~ . - FrancePop, data = wine)
##
## Model 1 Model 2
## (Intercept) -2.34 -3.65
## SE 7.70 1.69
##
## WinterRain 0.001153 0.001167
## SE 0.000499 0.000482
##
## AGST 0.6144 0.6164
## SE 0.0980 0.0952
##
## HarvestRain -0.003837 -0.003861
## SE 0.000837 0.000808
##
## Age 0.01377 0.02385
## SE 0.05821 0.00717
##
## FrancePop -2.21e-05
## SE 1.27e-04
##
```

Note how the coefficients for `modWine2`

have smaller errors than `modWine1`

.

The individual CIs for the unknown \(\beta_j\)’s can be obtained by applying the `confint`

function to an `lm`

object. Let’s compute the CIs for the model coefficients of `modWine1`

, `modWine2`

, and a new model `modWine3`

:

```
# Fit a new model
modWine3 <- lm(Price ~ Age + WinterRain, data = wine)
summary(modWine3)
##
## Call:
## lm(formula = Price ~ Age + WinterRain, data = wine)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.88964 -0.51421 -0.00066 0.43103 1.06897
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.9830427 0.5993667 9.982 5.09e-10 ***
## Age 0.0360559 0.0137377 2.625 0.0149 *
## WinterRain 0.0007813 0.0008780 0.890 0.3824
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5769 on 24 degrees of freedom
## Multiple R-squared: 0.2371, Adjusted R-squared: 0.1736
## F-statistic: 3.73 on 2 and 24 DF, p-value: 0.03884
# Confidence intervals at 95%
# CI: (lwr, upr)
confint(modWine3)
## 2.5 % 97.5 %
## (Intercept) 4.746010626 7.220074676
## Age 0.007702664 0.064409106
## WinterRain -0.001030725 0.002593278
# Confidence intervals at other levels
confint(modWine3, level = 0.90)
## 5 % 95 %
## (Intercept) 4.9575969417 7.008488360
## Age 0.0125522989 0.059559471
## WinterRain -0.0007207941 0.002283347
confint(modWine3, level = 0.99)
## 0.5 % 99.5 %
## (Intercept) 4.306650310 7.659434991
## Age -0.002367633 0.074479403
## WinterRain -0.001674299 0.003236852
# Compare with previous models
confint(modWine1)
## 2.5 % 97.5 %
## (Intercept) -1.834844e+01 13.6632391095
## WinterRain 1.153872e-04 0.0021910509
## AGST 4.106337e-01 0.8182146540
## HarvestRain -5.577203e-03 -0.0020974232
## Age -1.072931e-01 0.1348317795
## FrancePop -2.858849e-04 0.0002416171
confint(modWine2)
## 2.5 % 97.5 %
## (Intercept) -7.1524497573 -0.150690903
## WinterRain 0.0001670449 0.002166393
## AGST 0.4190113907 0.813771726
## HarvestRain -0.0055353098 -0.002185890
## Age 0.0089852800 0.038710748
confint(modWine3)
## 2.5 % 97.5 %
## (Intercept) 4.746010626 7.220074676
## Age 0.007702664 0.064409106
## WinterRain -0.001030725 0.002593278
```

In `modWine3`

, the 95% CI for \(\beta_0\) is \((4.7460, 7.2201),\) for \(\beta_1\) is \((0.0077, 0.0644),\) and for \(\beta_2\) is \((-0.0010, 0.0026).\) Therefore, we can say with a 95% confidence that *the coefficient of* `WinterRain`

*is non-significant* (`0`

is inside the CI). But, inspecting the CI of \(\beta_2\) in `modWine2`

we can see that *it is significant* for the model! How is this possible? The answer is that the presence of extra predictors affects the coefficient estimate, as we saw in Figure 2.7. Therefore, the precise statement to make is:

In model`Price ~ Age + WinterRain`

, with \(\alpha=0.05,\) the coefficient of`WinterRain`

is non-significant.

Note that this **does not** mean that the coefficient will be always non-significant: in `Price ~ Age + AGST + HarvestRain + WinterRain`

it is.

Compute and interpret the CIs for the coefficients, at levels \(\alpha=0.10,0.05,0.01,\) for the following regressions:

`Price ~ WinterRain + HarvestRain + AGST`

(`wine`

).`AGST ~ Year + FrancePop`

(`wine`

).

For the `assumptions`

dataset, do the following:

- Regression
`y7 ~ x7`

. Check that:- The intercept is not significant for the regression at any reasonable level \(\alpha.\)
- The slope is significant for any \(\alpha \geq 10^{-7}.\)

- Regression
`y6 ~ x6`

. Assume the linear model assumptions are verified.- Check that \(\hat\beta_0\) is significantly different from zero at any level \(\alpha.\)
- For which \(\alpha=0.10,0.05,0.01\) is \(\hat\beta_1\) significantly different from zero?

In certain applications, it is useful to *center* the
predictors \(X_1,\ldots,X_p\) prior to
fit the model, in such a way that the slope coefficients \((\beta_1,\ldots,\beta_p)\) measure the
effects of deviations of the predictors from their means. Theoretically,
this amounts to considering the linear model

\[\begin{align*} Y=\beta_0+\beta_1(X_1-\mathbb{E}[X_1])+\cdots+\beta_p(X_p-\mathbb{E}[X_p])+\varepsilon. \end{align*}\]

In the sample case, we proceed by replacing \(X_{ij}\) by \(X_{ij}-\bar{X}_j,\) which can be easily
done by the `scale`

function (see below). If, in addition,
the response is also centered, then \(\beta_0=0\) and \(\hat\beta_0=0.\) This centering of the data
has no influence on the significance of the predictors (but has
influence on the significance of \(\hat\beta_0\)), as it is just a linear
transformation of them.

```
# By default, scale centers (subtracts the mean) and scales (divides by the
# standard deviation) the columns of a matrix
wineCen <- data.frame(scale(wine, center = TRUE, scale = FALSE))
# Regression with centered response and predictors
modWine3Cen <- lm(Price ~ Age + WinterRain, data = wineCen)
# Summary
summary(modWine3Cen)
##
## Call:
## lm(formula = Price ~ Age + WinterRain, data = wineCen)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.88964 -0.51421 -0.00066 0.43103 1.06897
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.964e-16 1.110e-01 0.000 1.0000
## Age 3.606e-02 1.374e-02 2.625 0.0149 *
## WinterRain 7.813e-04 8.780e-04 0.890 0.3824
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5769 on 24 degrees of freedom
## Multiple R-squared: 0.2371, Adjusted R-squared: 0.1736
## F-statistic: 3.73 on 2 and 24 DF, p-value: 0.03884
```

This is for theoretical and modeling convenience. With this assumption, we just model the randomness of \(Y\) given the predictors. If the randomness of \(Y\)

*and*the randomness of \(X_1,\ldots,X_n\) was to be modeled, we will require from a significantly more complex model.↩︎Undestood as small \(|(\mathbf{X}'\mathbf{X})^{-1}|.\)↩︎

Prior to undertake the estimation of \(\sigma\) we have used the sample to estimate \(\hat{\boldsymbol\beta}.\) The situation is thus analogous to the discussion between the

*sample variance*\(s_x^2=\frac{1}{n}\sum_{i=1}^n\left(X_i-\bar{X}\right)^2\) and the*sample quasi-variance*\(\hat{s}_x^2=\frac{1}{n-1}\sum_{i=1}^n\left(X_i-\bar{X}\right)^2\) that are computed from a sample \(X_1,\ldots,X_n.\) When estimating \(\mathbb{V}\mathrm{ar}[X],\) both estimate previously \(\mathbb{E}[X]\) through \(\bar{X}.\) The fact that \(\hat{s}_x^2\) accounts for that prior estimation through the degrees of freedom \(n-1\) makes that estimator unbiased for \(\mathbb{V}\mathrm{ar}[X]\) (\(s_x^2\) is not).↩︎Recall that the sample variance of \(\hat\varepsilon_1,\ldots,\hat\varepsilon_n\) is \(\frac{1}{n}\sum_{i=1}^n\left(\hat\varepsilon_i-\bar{\hat\varepsilon}\right)^2.\)↩︎

In the sense of practically realistic.↩︎

Shortcut for

*significantly different from zero*.↩︎This is denoted as \(\frac{\hat{\beta}_j-0}{\hat{\mathrm{SE}}(\hat\beta_j)}\stackrel{H_0}{\sim}t_{n-p-1}.\)↩︎

In R, \(t_{n-p-1;\alpha/2}\) can be computed as

`qt(p = 1 - alpha / 2, df = n - p - 1)`

or`qt(p = alpha / 2, df = n - p - 1, lower.tail = FALSE)`

.↩︎For example,

`'**'`

indicates that the \(p\)-value lies within \(0.001\) and \(0.01.\)↩︎This is a

*context-guided*decision, not*data-driven*.↩︎Notice the use of

`-`

for*excluding*a particular predictor.↩︎