2.4 Inference for model parameters
The assumptions introduced in the previous section allow us to specify what is the distribution of the random vector \(\hat{\boldsymbol{\beta}}.\) The distribution is derived conditionally on the predictors’ sample \(\mathbf{X}_1,\ldots,\mathbf{X}_n.\) In other words, we assume that the randomness of \(\mathbf{Y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}\) comes only from the error terms and not from the predictors24. To denote this, we employ lowercase for the predictors’ sample \(\mathbf{x}_1,\ldots,\mathbf{x}_n.\)
2.4.1 Distributions of the fitted coefficients
The distribution of \(\hat{\boldsymbol{\beta}}\) is:
\[\begin{align} \hat{\boldsymbol{\beta}}\sim\mathcal{N}_{p+1}\left(\boldsymbol{\beta},\sigma^2(\mathbf{X}'\mathbf{X})^{-1}\right). \tag{2.11} \end{align}\]
This result can be obtained from the form of \(\hat{\boldsymbol{\beta}}\) given in (2.7), the sample version of the model assumptions given in (2.10), and the linear transformation property of a normal given in (1.4). Equation (2.11) implies that the marginal distribution of \(\hat\beta_j\) is
\[\begin{align} \hat{\beta}_j\sim\mathcal{N}\left(\beta_j,\mathrm{SE}(\hat\beta_j)^2\right),\tag{2.12} \end{align}\]
where \(\mathrm{SE}(\hat\beta_j)\) is the standard error, \(\mathrm{SE}(\hat\beta_j)^2:=\sigma^2v_j,\) and
\[\begin{align*} v_j\text{ is the }j\text{-th element of the diagonal of }(\mathbf{X}'\mathbf{X})^{-1}. \end{align*}\]
Recall that an equivalent form for (2.12) is (why?)
\[\begin{align*} \frac{\hat\beta_j-\beta_j}{\mathrm{SE}(\hat\beta_j)}\sim\mathcal{N}(0,1). \end{align*}\]
The interpretation of (2.12) is simpler in the case with \(p=1,\) where
\[\begin{align} \hat\beta_0\sim\mathcal{N}\left(\beta_0,\mathrm{SE}(\hat\beta_0)^2\right),\quad\hat\beta_1\sim\mathcal{N}\left(\beta_1,\mathrm{SE}(\hat\beta_1)^2\right),\tag{2.13} \end{align}\]
with
\[\begin{align} \mathrm{SE}(\hat\beta_0)^2=\frac{\sigma^2}{n}\left[1+\frac{\bar X^2}{s_x^2}\right],\quad \mathrm{SE}(\hat\beta_1)^2=\frac{\sigma^2}{ns_x^2}.\tag{2.14} \end{align}\]
Some insights on (2.13) and (2.14), illustrated interactively in Figure 2.13, are the following:
Bias. Both estimates are unbiased. That means that their expectations are the true coefficients for any sample size \(n.\)
Variance. The variances \(\mathrm{SE}(\hat\beta_0)^2\) and \(\mathrm{SE}(\hat\beta_1)^2\) have interesting interpretations in terms of their components:
Sample size \(n\). As the sample size grows, the precision of the estimators increases, since both variances decrease.
Error variance \(\sigma^2\). The more disperse the error is, the less precise the estimates are, since more vertical variability is present.
Predictor variance \(s_x^2\). If the predictor is spread out (large \(s_x^2\)), then it is easier to fit a regression line: we have information about the data trend over a long interval. If \(s_x^2\) is small, then all the data is concentrated on a narrow vertical band, so we have a much more limited view of the trend.
Mean \(\bar X\). It has influence only on the precision of \(\hat\beta_0.\) The larger \(\bar X\) is, the less precise \(\hat\beta_0\) is.
The insights about (2.11) are more convoluted. The following broad remarks, extensions of what happened when \(p=1,\) apply:
Bias. All the estimates are unbiased for any sample size \(n.\)
Variance. It depends on:
- Sample size \(n\). Hidden inside \(\mathbf{X}'\mathbf{X}.\) As \(n\) grows, the precision of the estimators increases.
- Error variance \(\sigma^2\). The larger \(\sigma^2\) is, the less precise \(\hat{\boldsymbol{\beta}}\) is.
- Predictor sparsity \((\mathbf{X}'\mathbf{X})^{-1}\). The more “disperse”25 the predictors are, the more precise \(\hat{\boldsymbol{\beta}}\) is.
The problem with the result in (2.11) is that \(\sigma^2\) is unknown in practice. Therefore, we need to estimate \(\sigma^2\) in order to use a result similar to (2.11). We do so by computing a rescaled sample variance of the residuals \(\hat\varepsilon_1,\ldots,\hat\varepsilon_n\):
\[\begin{align} \hat\sigma^2:=\frac{1}{n-p-1}\sum_{i=1}^n\hat\varepsilon_i^2.\tag{2.15} \end{align}\]
Note the \(n-p-1\) in the denominator. The factor \(n-p-1\) represents the degrees of freedom: the number of data points minus the number of already26 fitted parameters (\(p\) slopes plus \(1\) intercept) with the data. For the interpretation of \(\hat\sigma^2,\) it is key to realize that the mean of the residuals \(\hat\varepsilon_1,\ldots,\hat\varepsilon_n\) is zero, this is \(\bar{\hat\varepsilon}=0.\) Therefore, \(\hat\sigma^2\) is indeed a rescaled sample variance of the residuals which estimates the variance of \(\varepsilon\;\)27. It can be seen that \(\hat\sigma^2\) is unbiased as an estimator of \(\sigma^2.\)
If we use the estimate \(\hat\sigma^2\) instead of \(\sigma^2,\) we get more useful28 distributions than (2.12):
\[\begin{align} \frac{\hat\beta_j-\beta_j}{\hat{\mathrm{SE}}(\hat\beta_j)}\sim t_{n-p-1},\quad\hat{\mathrm{SE}}(\hat\beta_j)^2:=\hat\sigma^2v_j,\tag{2.16} \end{align}\]
where \(t_{n-p-1}\) represents the Student’s \(t\) distribution with \(n-p-1\) degrees of freedom.
The LHS of (2.16) is the \(t\)-statistic for \(\beta_j,\) \(j=0,\ldots,p.\) We will employ them for building confidence intervals and hypothesis tests in what follows.
2.4.2 Confidence intervals for the coefficients
Thanks to (2.16), we can have the \(100(1-\alpha)\%\) Confidence Intervals (CI) for the coefficient \(\beta_j,\) \(j=0,\ldots,p\):
\[\begin{align} \left(\hat\beta_j\pm\hat{\mathrm{SE}}(\hat\beta_j)t_{n-p-1;\alpha/2}\right)\tag{2.17} \end{align}\]
where \(t_{n-p-1;\alpha/2}\) is the \(\alpha/2\)-upper quantile of the \(t_{n-p-1}\). Usually, \(\alpha=0.10,0.05,0.01\) are considered.
This random CI contains the unknown coefficient \(\beta_j\) “with a probability of \(1-\alpha\)”. The previous quoted statement has to be understood as follows. Suppose you have 100 samples generated according to a linear model. If you compute the CI for a coefficient, then in approximately \(100(1-\alpha)\) of the samples the true coefficient would be actually inside the random CI. Note also that the CI is symmetric around \(\hat\beta_j.\) This is illustrated in Figure 2.14.
2.4.3 Testing on the coefficients
The distributions in (2.16) allow also to conduct a formal hypothesis test on the coefficients \(\beta_j,\) \(j=0,\ldots,p.\) For example the test for significance29 is especially important, that is, the test of the hypotheses
\[\begin{align*} H_0:\beta_j=0 \end{align*}\]
for \(j=0,\ldots,p.\) The test of \(H_0:\beta_j=0\) with \(1\leq j\leq p\) is especially interesting, since it allows us to answer whether the variable \(X_j\) has a significant linear effect on \(Y\). The statistic used for testing for significance is the \(t\)-statistic
\[\begin{align*} \frac{\hat\beta_j-0}{\hat{\mathrm{SE}}(\hat\beta_j)}, \end{align*}\]
which is distributed as a \(t_{n-p-1}\) under the (veracity of) the null hypothesis30.
The null hypothesis \(H_0\) is tested against the alternative hypothesis, \(H_1.\) If \(H_0\) is rejected, it is rejected in favor of \(H_1.\) The alternative hypothesis can be two-sided (we will focus mostly on these alternatives), such as
\[\begin{align*} H_0:\beta_j= 0\quad\text{vs.}\quad H_1:\beta_j\neq 0 \end{align*}\]
or one-sided, such as
\[\begin{align*} H_0:\beta_j=0 \quad\text{vs.}\quad H_1:\beta_j<(>)0. \end{align*}\]
The test based on the \(t\)-statistic is referred to as the \(t\)-test. It rejects \(H_0:\beta_j=0\) (against \(H_1:\beta_j\neq 0\)) at significance level \(\alpha\) for large absolute values of the \(t\)-statistic, precisely for those above the \(\alpha/2\)-upper quantile of the \(t_{n-p-1}\) distribution. That is, it rejects \(H_0\) at level \(\alpha\) if \(\frac{|\hat\beta_j|}{\hat{\mathrm{SE}}(\hat\beta_j)}>t_{n-p-1;\alpha/2}\;\)31. For the one-sided tests, it rejects \(H_0\) against \(H_1:\beta_j<0\) or \(H_1:\beta_j>0\) if \(\frac{\hat\beta_j}{\hat{\mathrm{SE}}(\hat\beta_j)}<-t_{n-p-1;\alpha}\) or \(\frac{\hat\beta_j}{\hat{\mathrm{SE}}(\hat\beta_j)}>t_{n-p-1;\alpha},\) respectively.
Remember the following insights about hypothesis testing.
In an hypothesis test, the \(p\)-value measures the degree of veracity of \(H_0\) according to the data. The rule of thumb is the following:
Is the \(p\)-value lower than \(\alpha\)?
- Yes \(\rightarrow\) reject \(H_0\).
- No \(\rightarrow\) do not reject \(H_0\).
The connection of a \(t\)-test for \(H_0:\beta_j=0\) and the CI for \(\beta_j,\) both at level \(\alpha,\) is the following:
Is \(0\) inside the CI for \(\beta_j\)?
- Yes \(\leftrightarrow\) do not reject \(H_0\).
- No \(\leftrightarrow\) reject \(H_0\).
The one-sided test \(H_0:\beta_j=0\) vs. \(H_1:\beta_j<0\) (respectively, \(H_1:\beta_j>0\)) can be done by means of the CI for \(\beta_j.\) If \(H_0\) is rejected, they allow us to conclude that \(\hat\beta_j\) is significantly negative (positive) and that for the considered regression model, \(X_j\) has a significant negative (positive) effect on \(Y\). The rule of thumb is the following:
Is the CI for \(\beta_j\) below (above) \(0\) at level \(\alpha\)?
- Yes \(\rightarrow\) reject \(H_0\) at level \(\alpha.\) Conclude \(X_j\) has a significant negative (positive) effect on \(Y\) at level \(\alpha.\)
- No \(\rightarrow\) the criterion is not conclusive.
2.4.4 Case study application
Let’s analyze the multiple linear model we have considered for the wine
dataset, now that we know how to make inference on the model parameters. The relevant information is obtained with the summary
of the model:
# Fit
modWine1 <- lm(Price ~ ., data = wine)
# Summary
sumModWine1 <- summary(modWine1)
sumModWine1
##
## Call:
## lm(formula = Price ~ ., data = wine)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.46541 -0.24133 0.00413 0.18974 0.52495
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.343e+00 7.697e+00 -0.304 0.76384
## WinterRain 1.153e-03 4.991e-04 2.311 0.03109 *
## AGST 6.144e-01 9.799e-02 6.270 3.22e-06 ***
## HarvestRain -3.837e-03 8.366e-04 -4.587 0.00016 ***
## Age 1.377e-02 5.821e-02 0.237 0.81531
## FrancePop -2.213e-05 1.268e-04 -0.175 0.86313
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.293 on 21 degrees of freedom
## Multiple R-squared: 0.8278, Adjusted R-squared: 0.7868
## F-statistic: 20.19 on 5 and 21 DF, p-value: 2.232e-07
# Contains the estimation of sigma ("Residual standard error")
sumModWine1$sigma
## [1] 0.2930287
# Which is the same as
sqrt(sum(modWine1$residuals^2) / modWine1$df.residual)
## [1] 0.2930287
The Coefficients
block of the summary
output contains the next elements regarding the significance of each coefficient \(\beta_j,\) this is, the test \(H_0:\beta_j=0\) vs. \(H_1:\beta_j\neq0\):
Estimate
: least squares estimate \(\hat\beta_j.\)Std. Error
: estimated standard error \(\hat{\mathrm{SE}}(\hat\beta_j).\)t value
: \(t\)-statistic \(\frac{\hat\beta_j}{\hat{\mathrm{SE}}(\hat\beta_j)}.\)Pr(>|t|)
: \(p\)-value of the \(t\)-test.Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
: codes indicating the size of the \(p\)-value. The more asterisks, the more evidence supporting that \(H_0\) does not hold32.
Note that a high proportion of predictors are not significant in modWine1
: FrancePop
and Age
are not significant (and the intercept is not significant also). This is an indication of an excess of predictors adding little information to the response. One explanation is the almost perfect correlation between FrancePop
and Age
shown before: one of them is not adding any extra information to explain Price
. This complicates the model unnecessarily and, more importantly, it has the undesirable effect of making the coefficient estimates less precise. We opt to remove the predictor FrancePop
from the model since it is exogenous to the wine context33. A data-driven justification of the removal of this variable is that it is the least significant in modWine1
.
Then, the model without FrancePop
34 is:
modWine2 <- lm(Price ~ . - FrancePop, data = wine)
summary(modWine2)
##
## Call:
## lm(formula = Price ~ . - FrancePop, data = wine)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.46024 -0.23862 0.01347 0.18601 0.53443
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.6515703 1.6880876 -2.163 0.04167 *
## WinterRain 0.0011667 0.0004820 2.420 0.02421 *
## AGST 0.6163916 0.0951747 6.476 1.63e-06 ***
## HarvestRain -0.0038606 0.0008075 -4.781 8.97e-05 ***
## Age 0.0238480 0.0071667 3.328 0.00305 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2865 on 22 degrees of freedom
## Multiple R-squared: 0.8275, Adjusted R-squared: 0.7962
## F-statistic: 26.39 on 4 and 22 DF, p-value: 4.057e-08
All the coefficients are significant at level \(\alpha=0.05.\) Therefore, there is no clear redundant information. In addition, the \(R^2\) is very similar to the full model, but the 'Adjusted R-squared'
, a weighting of the \(R^2\) to account for the number of predictors used by the model, is slightly larger. As we will see in Section 2.7.2, this means that, compared to the number of predictors used, modWine2
explains more variability of Price
than modWine1
.
A handy way of comparing the coefficients of both models is car::compareCoefs
:
car::compareCoefs(modWine1, modWine2)
## Calls:
## 1: lm(formula = Price ~ ., data = wine)
## 2: lm(formula = Price ~ . - FrancePop, data = wine)
##
## Model 1 Model 2
## (Intercept) -2.34 -3.65
## SE 7.70 1.69
##
## WinterRain 0.001153 0.001167
## SE 0.000499 0.000482
##
## AGST 0.6144 0.6164
## SE 0.0980 0.0952
##
## HarvestRain -0.003837 -0.003861
## SE 0.000837 0.000808
##
## Age 0.01377 0.02385
## SE 0.05821 0.00717
##
## FrancePop -2.21e-05
## SE 1.27e-04
##
Note how the coefficients for modWine2
have smaller errors than modWine1
.
The individual CIs for the unknown \(\beta_j\)’s can be obtained by applying the confint
function to an lm
object. Let’s compute the CIs for the model coefficients of modWine1
, modWine2
, and a new model modWine3
:
# Fit a new model
modWine3 <- lm(Price ~ Age + WinterRain, data = wine)
summary(modWine3)
##
## Call:
## lm(formula = Price ~ Age + WinterRain, data = wine)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.88964 -0.51421 -0.00066 0.43103 1.06897
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.9830427 0.5993667 9.982 5.09e-10 ***
## Age 0.0360559 0.0137377 2.625 0.0149 *
## WinterRain 0.0007813 0.0008780 0.890 0.3824
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5769 on 24 degrees of freedom
## Multiple R-squared: 0.2371, Adjusted R-squared: 0.1736
## F-statistic: 3.73 on 2 and 24 DF, p-value: 0.03884
# Confidence intervals at 95%
# CI: (lwr, upr)
confint(modWine3)
## 2.5 % 97.5 %
## (Intercept) 4.746010626 7.220074676
## Age 0.007702664 0.064409106
## WinterRain -0.001030725 0.002593278
# Confidence intervals at other levels
confint(modWine3, level = 0.90)
## 5 % 95 %
## (Intercept) 4.9575969417 7.008488360
## Age 0.0125522989 0.059559471
## WinterRain -0.0007207941 0.002283347
confint(modWine3, level = 0.99)
## 0.5 % 99.5 %
## (Intercept) 4.306650310 7.659434991
## Age -0.002367633 0.074479403
## WinterRain -0.001674299 0.003236852
# Compare with previous models
confint(modWine1)
## 2.5 % 97.5 %
## (Intercept) -1.834844e+01 13.6632391095
## WinterRain 1.153872e-04 0.0021910509
## AGST 4.106337e-01 0.8182146540
## HarvestRain -5.577203e-03 -0.0020974232
## Age -1.072931e-01 0.1348317795
## FrancePop -2.858849e-04 0.0002416171
confint(modWine2)
## 2.5 % 97.5 %
## (Intercept) -7.1524497573 -0.150690903
## WinterRain 0.0001670449 0.002166393
## AGST 0.4190113907 0.813771726
## HarvestRain -0.0055353098 -0.002185890
## Age 0.0089852800 0.038710748
confint(modWine3)
## 2.5 % 97.5 %
## (Intercept) 4.746010626 7.220074676
## Age 0.007702664 0.064409106
## WinterRain -0.001030725 0.002593278
In modWine3
, the 95% CI for \(\beta_0\) is \((4.7460, 7.2201),\) for \(\beta_1\) is \((0.0077, 0.0644),\) and for \(\beta_2\) is \((-0.0010, 0.0026).\) Therefore, we can say with a 95% confidence that the coefficient of WinterRain
is non-significant (0
is inside the CI). But, inspecting the CI of \(\beta_2\) in modWine2
we can see that it is significant for the model! How is this possible? The answer is that the presence of extra predictors affects the coefficient estimate, as we saw in Figure 2.7. Therefore, the precise statement to make is:
In model
Price ~ Age + WinterRain
, with \(\alpha=0.05,\) the coefficient ofWinterRain
is non-significant.
Note that this does not mean that the coefficient will be always non-significant: in Price ~ Age + AGST + HarvestRain + WinterRain
it is.
Compute and interpret the CIs for the coefficients, at levels \(\alpha=0.10,0.05,0.01,\) for the following regressions:
Price ~ WinterRain + HarvestRain + AGST
(wine
).AGST ~ Year + FrancePop
(wine
).
For the assumptions
dataset, do the following:
- Regression
y7 ~ x7
. Check that:- The intercept is not significant for the regression at any reasonable level \(\alpha.\)
- The slope is significant for any \(\alpha \geq 10^{-7}.\)
- Regression
y6 ~ x6
. Assume the linear model assumptions are verified.- Check that \(\hat\beta_0\) is significantly different from zero at any level \(\alpha.\)
- For which \(\alpha=0.10,0.05,0.01\) is \(\hat\beta_1\) significantly different from zero?
In certain applications, it is useful to center the predictors \(X_1,\ldots,X_p\) prior to fit the model, in such a way that the slope coefficients \((\beta_1,\ldots,\beta_p)\) measure the effects of deviations of the predictors from their means. Theoretically, this amounts to considering the linear model
\[\begin{align*} Y=\beta_0+\beta_1(X_1-\mathbb{E}[X_1])+\cdots+\beta_p(X_p-\mathbb{E}[X_p])+\varepsilon. \end{align*}\]
In the sample case, we proceed by replacing \(X_{ij}\) by \(X_{ij}-\bar{X}_j,\) which can be easily
done by the scale
function (see below). If, in addition,
the response is also centered, then \(\beta_0=0\) and \(\hat\beta_0=0.\) This centering of the data
has no influence on the significance of the predictors (but has
influence on the significance of \(\hat\beta_0\)), as it is just a linear
transformation of them.
# By default, scale centers (subtracts the mean) and scales (divides by the
# standard deviation) the columns of a matrix
wineCen <- data.frame(scale(wine, center = TRUE, scale = FALSE))
# Regression with centered response and predictors
modWine3Cen <- lm(Price ~ Age + WinterRain, data = wineCen)
# Summary
summary(modWine3Cen)
##
## Call:
## lm(formula = Price ~ Age + WinterRain, data = wineCen)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.88964 -0.51421 -0.00066 0.43103 1.06897
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.964e-16 1.110e-01 0.000 1.0000
## Age 3.606e-02 1.374e-02 2.625 0.0149 *
## WinterRain 7.813e-04 8.780e-04 0.890 0.3824
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5769 on 24 degrees of freedom
## Multiple R-squared: 0.2371, Adjusted R-squared: 0.1736
## F-statistic: 3.73 on 2 and 24 DF, p-value: 0.03884
This is for theoretical and modeling convenience. With this assumption, we just model the randomness of \(Y\) given the predictors. If the randomness of \(Y\) and the randomness of \(X_1,\ldots,X_n\) was to be modeled, we will require from a significantly more complex model.↩︎
Undestood as small \(|(\mathbf{X}'\mathbf{X})^{-1}|.\)↩︎
Prior to undertake the estimation of \(\sigma\) we have used the sample to estimate \(\hat{\boldsymbol\beta}.\) The situation is thus analogous to the discussion between the sample variance \(s_x^2=\frac{1}{n}\sum_{i=1}^n\left(X_i-\bar{X}\right)^2\) and the sample quasi-variance \(\hat{s}_x^2=\frac{1}{n-1}\sum_{i=1}^n\left(X_i-\bar{X}\right)^2\) that are computed from a sample \(X_1,\ldots,X_n.\) When estimating \(\mathbb{V}\mathrm{ar}[X],\) both estimate previously \(\mathbb{E}[X]\) through \(\bar{X}.\) The fact that \(\hat{s}_x^2\) accounts for that prior estimation through the degrees of freedom \(n-1\) makes that estimator unbiased for \(\mathbb{V}\mathrm{ar}[X]\) (\(s_x^2\) is not).↩︎
Recall that the sample variance of \(\hat\varepsilon_1,\ldots,\hat\varepsilon_n\) is \(\frac{1}{n}\sum_{i=1}^n\left(\hat\varepsilon_i-\bar{\hat\varepsilon}\right)^2.\)↩︎
In the sense of practically realistic.↩︎
Shortcut for significantly different from zero.↩︎
This is denoted as \(\frac{\hat{\beta}_j-0}{\hat{\mathrm{SE}}(\hat\beta_j)}\stackrel{H_0}{\sim}t_{n-p-1}.\)↩︎
In R, \(t_{n-p-1;\alpha/2}\) can be computed as
qt(p = 1 - alpha / 2, df = n - p - 1)
orqt(p = alpha / 2, df = n - p - 1, lower.tail = FALSE)
.↩︎For example,
'**'
indicates that the \(p\)-value lies within \(0.001\) and \(0.01.\)↩︎This is a context-guided decision, not data-driven.↩︎
Notice the use of
-
for excluding a particular predictor.↩︎