CHAPTER 6 General Linear Test
We are already done with the estimation of the model parameters (mostly based on Least Squares).
As a way to verify whether it is worthwhile to keep variables in our model, we can do hypothesis tests on the model parameters.
GOAL: Compare 2 models, reduced model vs full model
We want to know if adding more independent variables improves a linear model.
For the multiple linear regression model, there are four main hypotheses that we may want to test.
- Testing that all of the slope parameters are 0
- \(Ho: \beta_1 = \beta_2 = \cdots = \beta_k = 0\)
- \(Ha: \text{at least one }\beta_j \neq 0 \quad \text{for all }\ j = 1, 2,..., k\)
- Testing that one slope parameter is 0
- \(Ho: \beta_j = 0\)
- \(Ha: \beta_j \neq 0\)
- Testing several slope parameters equal to 0
- \(Ho: \beta_j \text{ are all equal to 0} \text{ for some } j \in {1,...,k}\)$
- \(Ha: \text{at least one } \beta_j \neq 0 \text{ for some } j \in {1,...,k}\)
- Testing a slope parameter equal to 0 sequentially
- \(Ho: \beta_j = 0 \text{ given variables other than } X_j \text{ are already in the model}\)
- \(Ha: \beta_j \neq 0 \text{ given variables other than } X_j \text{ are already in the model}\)
All of these hypotheses about regression coefficients can actually be tested using a unified approach!
Before the discussion of the actual tests, some important definitions and results will be presented as preliminaries.
6.1 Full and Reduced Models
Definition 6.1 (Full Model and Reduced Model)
- The full model or unrestricted model is the model containing all the independent variables that are or interest.
- The reduced model or restricted model is the model when the null hypothesis holds.
Example
Let’s use again the Anscombe
dataset with the following variables:
y
:income
x1
:education
x2
:young
x3
:urban
The full model includes all the predictor variables. \[ y=\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3+\varepsilon \]
# Fit the full model
full_model <- lm(income ~ education + young + urban, data = Anscombe)
# Summary of the full model
summary(full_model)
##
## Call:
## lm(formula = income ~ education + young + urban, data = Anscombe)
##
## Residuals:
## Min 1Q Median 3Q Max
## -438.54 -145.21 -41.65 119.20 739.69
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3011.4633 609.4769 4.941 1.03e-05 ***
## education 7.6313 0.8798 8.674 2.56e-11 ***
## young -6.8544 1.6617 -4.125 0.00015 ***
## urban 1.7692 0.2591 6.828 1.49e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 259.7 on 47 degrees of freedom
## Multiple R-squared: 0.7979, Adjusted R-squared: 0.785
## F-statistic: 61.86 on 3 and 47 DF, p-value: 2.384e-16
The reduced model includes only a subset of the predictor variables.
For example, excluding x3
by assuming \(\beta_3=0\) (null is true). \[
y=\beta_0+\beta_1x_1+\beta_2x_2+\varepsilon
\]
# Fit the full model
reduced_model <- lm(income ~ education + young, data = Anscombe)
# Summary of the full model
summary(reduced_model)
##
## Call:
## lm(formula = income ~ education + young, data = Anscombe)
##
## Residuals:
## Min 1Q Median 3Q Max
## -590.04 -220.72 -26.48 186.13 891.03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4783.061 770.168 6.210 1.20e-07 ***
## education 9.588 1.162 8.253 9.15e-11 ***
## young -9.585 2.252 -4.256 9.61e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 362.6 on 48 degrees of freedom
## Multiple R-squared: 0.5975, Adjusted R-squared: 0.5807
## F-statistic: 35.63 on 2 and 48 DF, p-value: 3.266e-10
6.2 Extra Sum of Squares
In regression work, the question often arises as to whether or not it is worthwhile to include certain terms in the model. This question can be investigated by considering the extra portion of the regression sum of squares which arises due to the fact that the terms under consideration were in the model.
Definition 6.2 (Extra Sum of Squares)
Extra Sums of Squares (or Sequential Sum of Squares) represents the difference in Error Sums of Squares (SSE) between two models. \[
ESS = SSE(R)-SSE(F)
\] where \(SSE(R)\) is the SSE of the reduced model and \(SSE(F)\) is the SSE of the full model.
Remarks:
Specifically, ESS measures the marginal decrease in SSE when an additional set of predictors is incorporated into the model.
The SSE of the full model is always less than or equal the SSE of the reduced model: \(SSE(F ) \leq SSE(R)\)
- More predictors \(\rightarrow\) Better Fit \(\rightarrow\) smaller deviations around the fitted regression line
These extra sums of squares reflect the reduction in the error sum of squares by adding an independent variable to the model, given that another independent variable is already in the model.
The extra sum of squares can be thought of as the increase in the regression sum of squares achieved by introducing the new variable.
Suppose you have two variables \(X_1\) and \(X_2\) being considered in the model. The increase in regression sum of squares when \(X_2\) is added given that \(X_1\) was already in the model may be denoted by \(SSR(X_2|X_1)=SSR(X_1,X_2)-SSR(X_1)\)
Also, since always \(SST = SSR + SSE\) , and SST of the full model and the reduced model are the same:
\[\begin{align} SSR(X_2|X_1)&=SSR(X_1,X_2)-SSR(X_1)\\ &=(SST-SSE(X_1,X_2))-(SST-SSE(X_1))\\ &=SSE(X_1)-SSE(X_1,X_2)\\ \end{align}\]
6.3 General Linear Test
Definition 6.3 (General Linear Test)
The General Linear Test is used to determine if adding a variable/s will help improve the model fit. This is done by comparing a smaller (reduced) model with fewer variables vs a larger (full) model with added variables. We use an F-statistic to decide whether or not to reject the smaller reduced model in favor of the larger full model.
The basic steps of the general linear test are
Fit the full model and obtain the error sum of squares \(SSE(F)\)
Fit the reduced model and obtain the error sum of squares \(SSE(R)\)
The test statistic will be \[ F^*=\frac{\left(\frac{SSE(R)-SSE(F)}{\text{df}_R-\text{df}_F}\right)}{\frac{SSE(F)}{\text{df}_F}} \] and the critical region is \(F^*>F_{(\alpha,\text{df}_R-\text{df}_F,\text{df}_F)}\)
Remarks:
- (Almost equal SSE of reduced vs full)
- If \(SSE(F)\) is not much less than \(SSE(R)\), using the full model (or having additional independent variable/s) does not account for much more of the variability of \(Y_i\) than the reduced model.
- This means that the data suggest that the null hypothesis holds.
- If the additional independent variable does not contribute much to the model, then it is better to use the reduced model than the full model to achieve parsimony.
- (Large SSE of reduced vs full)
- On the other hand, a large difference \(SSE(R) − SSE(F)\) suggests that the alternative hypothesis holds because the addition of parameters in the model helps to reduce substantially the variation of the observations \(Y_i\) around the fitted regression line.
- The numerator of the general linear F-statistic is the extra sum of squares between the full model and the reduced model.
We will use the general linear test for constructing tests for different hypotheses.
The overarching goal of the following sections is to determine if “adding” a variable improves the model model fit.
We will use the General Linear Test in the following examples.
Concepts such as the Partial F test and Sequential F test will be introduced.
Examples in R are also presented, using the Anscombe
dataset.
Let
- \(Y_{i}\) = income
: per-capita income (in dollars) of state \(i\)
- \(X_{i1}\) = education
: per-capita education expenditure (in dollars) of state \(i\)
- \(X_{i2}\) = young
: proportion under 18 (per 1000) of state \(i\)
- \(X_{i3}\) = urban
: Proportion urban (per 1000) of state \(i\)
Testing that all of the slope parameters are 0
Suppose we are testing whether all parameters equal to 0
- \(Ho:\beta_1=\beta_2=\cdots=\beta_k=0\)
- \(Ha:\) at least one \(\beta_j\neq0\) for \(j=1,\cdots,k\)
Using the General Linear Test,
- The \(SSE(F)\) is just the usual error sum of squares seen in the ANOVA table.
- Since the null is \(\beta_j\) are all equal to 0, the reduced model suggests that none of the variations in the response Y is explained by any of the predictors. Therefore, \(SSE(R)\) is just the total sum of squares, which is the \(SST\) that appears in the ANOVA table.
With this, the F Statistic is \[\begin{align} F^*&=\frac{\left(\frac{SSE(R)-SSE(F)}{\text{df}_R-\text{df}_F}\right)}{\frac{SSE(F)}{\text{df}_F}}\\ &=\frac{\left(\frac{SST-SSE}{(n-1)-(n-p)}\right)}{\frac{SSE}{n-p}}\\ &=\frac{SSR/(p-1)}{SSE/(n-p)} \end{align}\]
and the critical region to reject \(Ho\) is \(F^*>F_{(\alpha,p-1,n-p)}\)
This is just the usual F-test for regression Relation!
We already saw this in ANOVA Table and F Test for Regression Relation.
Illustration in R:
The full model includes all variables, while the reduced model only includes the intercept.
- Full Model: \(Y_i=\beta_0+\beta_1X_{i1} + \beta_2 X_{i2} + \beta_3 X_{i3}+\varepsilon_i\)
- Reduced Model: \(Y_i=\beta_0+\varepsilon_i\)
The hypotheses can be set as: - \(Ho:\beta_1=\beta_2=\beta_3=0\) - \(Ha:\beta_1\neq0 \text{ or } \beta_2\neq 0\text{ or } \beta_3\neq 0\)
We use the anova()
function to compare the full model vs the reduced model.
# Full Model: all variables included
F_model <- lm(income~education+young+urban, data = Anscombe)
# Reduced Model: no variables included, all beta_j = 0
R_model <- lm(income~1, data = Anscombe)
# Performing the F-test: reduced vs full
anova(R_model, F_model)
## Analysis of Variance Table
##
## Model 1: income ~ 1
## Model 2: income ~ education + young + urban
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 50 15681455
## 2 47 3168730 3 12512725 61.865 2.384e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this output, - \(\text{df of }SSE(R) = 51 - 1 = 50\) - \(\text{df of }SSE(F) = 51 - 4 = 47\)
Conclusion: At 0.05 level of significance, there is sufficient evidence to conclude that there is at least one slope parameter that is not equal to 0. That is, from the 3 variables, at least one variable contributes to the model.
Note: The F-Statistic here is the same as the F-statistic shown in the summary()
output, which is 61.86
##
## Call:
## lm(formula = income ~ education + young + urban, data = Anscombe)
##
## Residuals:
## Min 1Q Median 3Q Max
## -438.54 -145.21 -41.65 119.20 739.69
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3011.4633 609.4769 4.941 1.03e-05 ***
## education 7.6313 0.8798 8.674 2.56e-11 ***
## young -6.8544 1.6617 -4.125 0.00015 ***
## urban 1.7692 0.2591 6.828 1.49e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 259.7 on 47 degrees of freedom
## Multiple R-squared: 0.7979, Adjusted R-squared: 0.785
## F-statistic: 61.86 on 3 and 47 DF, p-value: 2.384e-16
Testing that one slope parameter is 0
Suppose we are testing a slope parameter equal to 0
- \(Ho:\beta_j=0\)
- \(Ha:\) \(\beta_j\neq0\)
Using the General Linear Test,
- The \(SSE(F)\) is just the usual error sum of squares seen in the ANOVA table.
- Since the null is \(\beta_j=0\), the reduced model only removes the \(j^{th}\) variable, while keeping the rest of the independent variables. That is, \(SSE(R)=SSE(\text{w/o }X_j)\)
With this, the F Statistic is \[\begin{align} F^*&=\frac{\left(\frac{SSE(R)-SSE(F)}{\text{df}_R-\text{df}_F}\right)}{\frac{SSE(F)}{\text{df}_F}}\\ &=\frac{\left(\frac{SSE(\text{w/o } X_j)-SSE}{(n-(p-1))-(n-p)}\right)}{\frac{SSE}{n-p}}\\ &=\frac{\left(\frac{SSR(X_j|X_1,...,X_{j-1},X_{j+1},..., X_p)-SSE}{1}\right)}{\frac{SSE}{n-p}}\\ &=\frac{SSR(F)-SSR(\text{w/o } X_j)}{MSE} \end{align}\]
and the critical region to reject \(Ho\) is \(F^*>F_{(\alpha,1,n-p)}\)
This test is usually called the Partial F-test for one slope parameter
Remarks: Relationship of the Partial F-test and the T-test for one slope parameter
Another way to devise a test for each \(\beta_j\) is by using the probability distribution of the estimate \(\hat\beta_j\).
Recall Section 5.1 for the confidence interval and hypothesis test for \(\beta_j\) .
The goal of the Partial F-tests and the T-tests are the same, but the approaches are different
Aspect | Partial F-test (removing \(X_j\)) | T-test |
---|---|---|
Purpose | To assess the significance of a single predictor when added to the model. | To assess the significance of an individual predictor’s coefficient. |
Null Hypothesis \(Ho\) | The predictor being tested has no effect (\(\beta_j=0\)) | The predictor being tested has no effect (\(\beta_j=0\) ) |
Alternative Hypothesis \(Ha\) | The predictor being tested has effect (\(\beta_j\neq0\)) | The predictor being tested has effect (\(\beta_j\neq0\)) |
Test Statistic | F-statistic: \(F^*=\frac{SSR(F)-SSR(\text{w/o } X_j)}{MSE}\) | t-statistic: \(t_j=\frac{\hat{\beta}_j}{\widehat{se(\hat\beta_j})}\) |
Interpretation | A significant F indicates the predictor improves the model fit. | A significant t indicates the predictor is significantly associated with the response variable. |
When Used | When comparing nested models (full vs. reduced with one less predictor). | When evaluating each predictor individually within the same model. |
Relationship | When only one predictor is tested, the F-statistic is the square of the t-statistic for that predictor. | The t-statistic is directly related to the F-statistic: \(F^*=t^2\) . |
Illustration in R
The full model includes all variables, while the reduced model removes one variable. In this example, we want to know if adding urban
improves the model fit.
- Full Model: \(Y_i=\beta_0+\beta_1X_{i1} + \beta_2 X_{i2} + \beta_3 X_{i3}+\varepsilon_i\)
- Reduced Model: \(Y_i=\beta_0+\beta_1X_{i1} + \beta_2 X_{i2}+\varepsilon_i\)
The hypotheses can be set as: - \(Ho:\beta_3=0\) given \(\beta_1\neq0\) and \(\beta_2\neq0\)(or \(X_1\) and \(X_2\) are already in the model) - \(Ha:\beta_3\neq 0\)
# Reduced Model: removing urban only
R_model <- lm(income ~ education + young, data = Anscombe)
# Full Model: adding urban, all variables included
F_model <- lm(income~education+young+urban, data = Anscombe)
# Performing the F-test: full vs reduced with no urban
anova(R_model, F_model)
## Analysis of Variance Table
##
## Model 1: income ~ education + young
## Model 2: income ~ education + young + urban
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 48 6311541
## 2 47 3168730 1 3142811 46.616 1.493e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion: Based on the result of the F-test, adding urban
in the model significantly improves the model fit, vs if education
and young
are the only predictors in the model.
Testing several slope parameters equal to 0
Testing that a subset — more than one, but not all — of the slope parameters are 0
Suppose we are testing a slope parameter equal to 0
\(Ho:\) Some \(\beta_j\) are all equal to \(0\)
\(Ha:\) at least 1 of some \(\beta_j\neq0\)
This is almost the same as Testing that all of the slope parameters are 0, but the reduced model may contain at least one independent variable already.
As an illustrative example:
- \(Ho:\) \(\beta_2=\beta_3=0\) (\(\beta\) and \(X_1\) is already included in the model)
- \(Ha:\) at least 1 of the \(\beta_j\neq 0\)
Using the General Linear Test
- The full model still includes all the variables. That is, \(SSE(F)\) is the usual \(SSE\)
- Since the null is \(\beta_2=\beta_3=0\), the reduced model removes the variables \(X_2\) and \(X_3\), while keeping the rest of the independent variables. That is, \(SSE(R)=SSE(\text{w/o }X_2 \text{ and } X_3)\)
With this, the F Statistic is \[\begin{align} F^*&=\frac{\left(\frac{SSE(R)-SSE(F)}{\text{df}_R-\text{df}_F}\right)}{\frac{SSE(F)}{\text{df}_F}}\\ &=\frac{\left(\frac{SSE(\text{w/o } X_2 \text{ and } X_3)-SSE}{(n-(p-2))-(n-p)}\right)}{\frac{SSE}{n-p}}\\ &=\frac{\left(\frac{SSR(X_2,X_3|X_1,X_4,...,X_p)}{2}\right)}{\frac{SSE}{n-p}}\\ &=\frac{(SSR(F)-SSR(\text{w/o } X_2 \text{ and } X_3))/2}{MSE} \end{align}\]
and the critical region to reject \(Ho\) is \(F^*>F_{(\alpha,2,n-p)}\)
In this example, we only removed 2 variables. You can extend it to several variables.
This test is usually called the Partial F-test for several slope parameters
Illustration in R
The full model includes all 3 variables, while the reduced model has no variables young
and urban
. Only education
will be retained.
# Reduced Model: education only, removing young and urban
R_model <- lm(income ~ education, data = Anscombe)
# Full Model: adding young and urban, all variables included
F_model <- lm(income~education+young+urban, data = Anscombe)
# Performing the F-test: reduced vs full
anova(R_model, F_model)
## Analysis of Variance Table
##
## Model 1: income ~ education
## Model 2: income ~ education + young + urban
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 49 8692865
## 2 47 3168730 2 5524135 40.968 5.017e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
SSE_R<-sum((Anscombe$income - fitted(R_model))^2)
SSE_F<-sum((Anscombe$income - fitted(F_model))^2)
((SSE_R - SSE_F)/2)/(SSE_F/47)
## [1] 40.96821
Conclusion: The addition of young
and urban
significantly improves the model fit, vs if education
is the only predictor in the model.
Testing a slope parameter equal to 0 sequentially
Suppose we are testing a slope parameter equal 0 sequentially
- \(Ho:\beta_j=0\) given variable/s other than \(X_j\) is already in the model.
- \(Ha:\beta_j\neq0\) given variable/s other than \(X_j\) is already in the model.
This is the same as Testing that one slope parameter is 0, but in stages.
As an illustrative example:
- \(Ho:\beta_2=0\) given \(\beta_1\neq0\) or variable \(X_1\) is sure to be in the model.
- \(Ha:\beta_2\neq0\) given \(\beta_1\neq0\) or variable \(X_1\) is sure to be in the model.
Using the General Linear Test, one can see that
- The error sum of squares for the full model, \(SSE(F)\), is the SSE when \(X_1\) and \(X_2\) are both in the model, \(SSE(X1, X2)\)
- The error sum of squares for the reduced model, \(SSE(R)\), is the error sum of squares without the variable \(X_2\), which is the SSE involving \(X_1\) only, \(SSE(X_1)\)
With this, the F Statistic is now \[\begin{align} F^*&=\frac{\left(\frac{SSE(R)-SSE(F)}{\text{df}_R-\text{df}_F}\right)}{\frac{SSE(F)}{\text{df}_F}}\\ &=\frac{\left(\frac{SSE(X_1)-SSE(X_1,X_2)}{(n-2))-(n-3)}\right)}{\frac{SSE(X_1,X_2)}{n-3}}\\ &=\frac{\left(\frac{SSR(X_2|X_1)}{1}\right)}{\frac{SSE(X_1,X_2)}{n-3}}\\ &=\frac{SSR(X_1,X_2)-SSR(X_1)}{SSE(X_1,X_2)/(n-3)} \end{align}\]
and the critical region will be \(F^∗ > F (\alpha, 1, n − 3)\)
We call this the Sequential F-test.
Remarks:
- Notice that in the example, it will be equivalent to the Partial F-test where the full model includes \(X_1\) and \(X_2\), and the reduced model only includes \(X_1\). In other words, this is just the partial F-test on the variable which entered the regression at that stage.
- It is also easy to extend it to other cases like testing \(\beta_3 = 0\) given \(\beta_1 \neq 0\) and \(\beta_2 \neq 0\), or testing \(\beta_4 = 0\) given \(\beta_1 \neq 0\), \(\beta_2 \neq 0\), and \(\beta_3 \neq 0\).
Source of Variation | Sum of Squares | df | Mean Squares | F |
---|---|---|---|---|
Extra \(X_1\) | \(SSR(X_1)\) | 1 | \(SSR(X_1)\) | \(\frac{SSR(X_1)}{MSE(X_1)}\) |
Extra \(X_2|X_1\) | \(SSR(X_2|X_1)\) | 1 | \(SSR(X_2|X_1)\) | \(\frac{SSR(X_2|X_1)}{MSE(X_1,X_2)}\) |
Extra \(X_3|X_1,X2\) | \(SSR(X_3|X_1,X2)\) | 1 | \(SSR(X_3|X_1,X2)\) | \(\frac{SSR(X_3|X_1,X2)}{MSE(X_1,X_2,X_3)}\) |
Extra \(X_4|X_1,X_2,X_3\) | \(SSR(X_4|X_1,X_2,X_3)\) | 1 | \(SSR(X_4|X_1,X_2,X_3)\) | \(\frac{SSR(X_4|X_1,X_2,X_3)}{MSE(X_1,X_2,X_3,X_3)}\) |
Error (full model) | \(SSE(X_1,X_2,X_3,X_4)\) | n-5 | \(MSE(X_1,X_2,X_3,X_4)\) | |
Total | \(SST\) | n-1 |
Summary of the General Linear Tests
- Reduced model - Model under Ho; Full model - Model where all variables of concern are inside it.
- Extra sum of squares: \[ SSR(\text{new vars|old vars})= SSE(\text{old vars only})−SSE(\text{all vars})= SSE(R) - SSE(F) \]
- The test statistic for the general linear test is \[ F^*=\frac{\left(\frac{SSE(R)-SSE(F)}{\text{df}_R-\text{df}_F}\right)}{\frac{SSE(F)}{\text{df}_F}}\sim F(\text{df}_R-\text{df}_F,\text{df}_F), \quad \text{under } Ho \]
- The partial F-test can be made for all regression coefficients as though each corresponding variable were the last to enter the equation, to see the relative effects of each variable in excess of the others. Thus, the test is a useful criterion for adding or removing terms from the model for variable selection.
- When the variables are added one by one in stages to a regression equation, then we do a sequential F-test. This is just the partial F-test on the variable which entered the regression at that stage.
Question: Why is my slope parameter not significant?
- Recall: we reject \(Ho: \beta_j=0\) if \(|T|>t_{\alpha/2,n-p}\). Not rejecting \(Ho\) means we conclude that the variable \(X_j\) is not linearly associated with the dependent variable.
- Possible reasons of insignificant parameter:
- the predictor in question itself is not a good (linear) predictor of dependent variable
- n is too small (sample size is too small)
- p is too large (number of predictors is too large)
- SSE is too large (all of the predictors are really not of good quality)
- The predictors are highly correlated with one another.
Exercise 6.1 This exercise helps you to visualize the relationship of the t-test on the parameters \(\beta\) and the partial F-test of adding a single independent variable.
Let’s try the mtcars
data.
Create a linear model that predicts
mpg
using variablescyl
,wt
, andqsec
. Take note of the T statistic and the p-value for the coefficient of the variablewt
.Perform a partial F-test of the reduced model
mpg~cyl+qsec
vs the full modelmpg~cyl+qsec+wt
. Take note of the F-statistic and the p-value.
What is the relationship of the T-statistic and p-value in (1) and the F-Statistic and p-value in (2)?
Explain the possible implications and interpretation of the T-tests for the coefficients \(\beta\).
6.4 Coefficient of Partial Determination
In checking if one variable contributes to the model fit, we used the general linear test to show this.
Here, we also show a metric that quantifies the contribution of an independent variable to the model, related to the coefficient of determination.
Definition 6.4 (Coefficient of Partial Determination) A coefficient of partial determination (or partial \(R^2\)) measures the marginal contribution of one independent variable when all others are already included in the model.
\[\begin{align} R^2_{Y_j|1,2,...,j-1,j+1,...,k}&=\frac{SSR(X_j|X_1,X_2,...,X_{j-1},X_{j+1},...,X_k)}{SSE(X_1,X_2,...,X_{j-1},X_{j+1},...,X_k)} \\ &= \frac{SSE(R)-SSE(F)}{SSE(R)} \end{align}\]In the subscripts to \(R^2\), the entries to the left of the pipe show which variable is taken as the response and which variable is being added. The entries to the right show the variables already in the model.
Notes:
- The coefficient of partial determination is the marginal contribution of \(X_j\) to the reduction of variation in \(Y\) when \(X_1, X_2, ..., X_j-1, X_j+1, ..., X_k\) are already in the model
- This gives us the proportion of variation explained by \(X_j\) that cannot be explained by the other independent variables.
- The coefficient of partial determination can take on values between 0 and 1.
- The coefficient of partial correlation is the square root of the coefficient of partial determination. It follows the sign of the associated regression coefficient. It does not have as clear a meaning as the coefficient of partial determination.
Example in R
F_model <- lm(income~education + young + urban, data = Anscombe)
R_model <- lm(income~education + young, data = Anscombe)
# SSE of the full model
SSE_F <- sum((F_model$residuals)^2)
# SSE of the reduced model
SSE_R <- sum((R_model$residuals)^2)
(SSE_R - SSE_F)/SSE_R
## [1] 0.4979467
urban
explains 49.79 % of variation in income
that education
and young
cannot explain.