3 Confidence intervals
3.1 Confidence interval for the simple linear regression slope \((\beta_{1})\)
Another way to make inferences about the population slope \(\beta_{1}\) (besides the hypothesis test) is to estimate it using a confidence interval. So why do we want to construct a confidence interval for the slope of the line? Because we want to know what the value of the slope for the whole population of data is. The problem is that the population data are usually not available and thus we cannot calculate a single value for the slope of the population data.
Another problem is that the slope, \(\hat{\beta_{1}}\), for the sample data varies from sample to sample because it depends on the data selected for the sample. For example, one sample taken from a specific population will indicate that \(\hat{\beta_{1}} = 3.54\), another sample from the same population will indicate that \(\hat{\beta_{1}}=4.76\) and another sample will indicate that \(\hat{\beta_{1}}= 2.98\). So what will the population slope, \(\beta_{1}\) be: 3.54, 4.76 or 2.98? Since the probability that \(\hat{\beta_{1}}\) will equal \(\beta_{1}\) is almost zero, we have to estimate the population slope by means of a confidence interval. A confidence interval for \(\hat{\beta_{1}}\) is the specification of two values (we call them a lower limit and an upper limit) between which we have a certain degree of “confidence” the true \(\hat{\beta_{1}}\) value lies. This interval is built around the value of the sample slope . Thus, we will never know what the exact value of the population slope is, but at least we will have an interval that can more or less indicate what this value will be.
So what is this “confidence” that we are talking about? The interval that we construct will have a specified confidence or probability of correctly estimating the true value of the population slope. We usually construct a 9\0%, 95% or a 99% confidence interval. If we construct, for example a 95% confidence interval, it means that we can be 95% confident that the value of the population slope \(\hat{\beta_{1}}\) will fall between the lower and upper limits of the interval.
The equation for the confidence interval is below
\[\hat{\beta_{1}} \pm t_{(n-2;\frac{\alpha}{2})} \times \frac{S}{\sqrt{SS_{xx}}}\]
or,
\[\hat{\beta_{1}} \pm t_{(n-2;\frac{\alpha}{2})}\times S_{\hat{\beta_{1}}}\]
where \(S_{\hat{\beta_{1}}}\) is the standard error for \(\hat{\beta_{1}}\).
Example 3.1 Refer to the income-savings example and data. Construct a 95% confidence interval for the slope and interpret the interval. Recall that for this data \(SS_{xx}\) = 123.3415, \(\hat{\beta_{1}}\) = 0.9661, s = 2.1261, and n = 6.
Solution
If the CI should be 95%, then \(\alpha = 1-0.95 = 0.05\) or 5%.
\(\hat{\beta_{1}} \pm t_{(n-2;\frac{\alpha}{2})} \times \frac{S}{\sqrt{SS_{xx}}}\)
\(0.9661 \pm t_{(n-2;\frac{\alpha}{2})} \times \frac{2.1261}{\sqrt{123.3415}}\)
\(0.9661 \pm 2.776 \times \frac{2.1261}{\sqrt{123.3415}}\)
\(0.9661 \pm 0.5313\)
\([0.4348;1.4974]\)
Thus, we estimate, with 95% confidence, that the population slope \(\beta_{1}\) is included somewhere between 0.4348 and 1.4974. However, the exact value of \(\beta_{1}\) will never be known, but, at least we have an idea of what the value can be.
Using R
income <- c(24, 26, 12, 22, 20, 18)
savings <- c(12, 14, 1.5, 9, 6, 2)
model <- lm(savings ~ income)
summary(model)
##
## Call:
## lm(formula = savings ~ income)
##
## Residuals:
## 1 2 3 4 5 6
## 1.04054 1.10811 2.13514 -0.02703 -1.09459 -3.16216
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -12.2297 3.9868 -3.068 0.03738 *
## income 0.9662 0.1914 5.049 0.00724 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.125 on 4 degrees of freedom
## Multiple R-squared: 0.8644, Adjusted R-squared: 0.8305
## F-statistic: 25.49 on 1 and 4 DF, p-value: 0.007237
## [1] 2.776445
## [1] 0.4347884 1.4976116
## 2.5 % 97.5 %
## (Intercept) -23.2988618 -1.160598
## income 0.4348809 1.497552
3.2 Confidence interval for prediction
As the population data is not known, we will use the sample results to estimate the predicted y-values for the whole population of data. To do this, we (again) need to construct a confidence interval around the mean of y. To construct a confidence interval around the mean of y, we use this formula.
\[\hat{y} \pm t_{(n-2;\frac{\alpha}{2})} S \sqrt{\frac{1}{n}+ \frac{(X_{p}-\bar{x})^2}{SS_{XX}}}\]
Example 3.2 Construct a 90% confidence interval for the mean amount savings when a person receives a montlhy income of R19 700. Recall that for this data \(SS_{xx}=123.3415\), \(\bar{X}=20.333\), \(s=2.1261\), and \(n=6\)
Solution
- First of all we must compute the value of \(\hat{y}\) and \(X_{p}\)
\(\hat{y} = -12.2273 + 0.9661 X = -12.2273 + 0.9661(19.7) = 6.8049\)
\(X_{p}\) = 19.7 (after the given x value was divided by 1000)
\(\hat{y} \pm t_{(6-2;\frac{0.05}{2})} 2.1261 \times \sqrt{\frac{1}{6}+ \frac{(19.7-20.3)^2}{123.3415}}\)
\(\hat{y} \pm 2.132 \times 2.1261 \times 0.4122\)
\(6.8049 \pm 1.8684\)
We estimate, with 90% of confidence, that for the whole population of persons who receives an income of R19 700, the mean amount of savings will be between R493.65 and R867.33