Chapter 6 Curvilinear Regression

6.1 Introduction to Curvilinear Regression

The curvilinear regression analysis can be used to determine if not-so-linear trends exist between X and Y.

A simple regression model extension for curved relations is the polynomial model. This is an example of curvilinear regression.

Hypothesis Test

A null hypothesis is there is no relationship between the X and Y variables when doing curvilinear regression.
Second null hypothesis is the increase in $R^{2}$ is only as large as you would expect by chance.
As you add more parameters to an equation, it will always fit the data better. A quadratic and cubic equations will always have an equal or higher $R^{2}$ than the linear. A cubic equation will usually have an equal or higher $R^{2}$ than quadratic.

6.2 Descriptive Data Analysis

Just like any examples else, we need to prepare the data before we run the analysis.

library(haven)
curvreg_data <- read_sav("curvilinear_anxiety.sav")
head(curvreg_data)

## # A tibble: 6 x 8
##   anxiety perform anxiety_squared anxiety_third anxiety_mean anxiety_centered
##     <dbl>   <dbl>           <dbl>         <dbl>        <dbl>            <dbl>
## 1       1      12               1             1            4               -3
## 2       1      15               1             1            4               -3
## 3       2      26               4             8            4               -2
## 4       2      22               4             8            4               -2
## 5       3      42               9            27            4               -1
## 6       3      36               9            27            4               -1
## # … with 2 more variables: anxiety_centered_squared <dbl>,
## #   anxiety_centered_third <dbl>

# Plot a Scatterplot between IV and DV
plot(curvreg_data$anxiety,curvreg_data$perform, main="Scatterplot betwwen Anxiety and  Performance",xlab="Anxiety",ylab="Performance")

6.3 Run the Curvilinear Regression Model

6.3.1 Linear regression model

# Run a linear regression model by using lm() function
curvreg_lm <- lm(perform~anxiety,data=curvreg_data)
summary(curvreg_lm)

## 
## Call:
## lm(formula = perform ~ anxiety, data = curvreg_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.196 -12.710  -3.429  13.277  20.161 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  29.7857     8.5704   3.475  0.00458 **
## anxiety      -0.5893     1.9164  -0.307  0.76374   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.34 on 12 degrees of freedom
## Multiple R-squared:  0.007818,	Adjusted R-squared:  -0.07486 
## F-statistic: 0.09455 on 1 and 12 DF,  p-value: 0.7637

# Add fit lines
plot(perform~anxiety,data=curvreg_data)
abline(lm(perform~anxiety,data=curvreg_data))

6.3.2 Quadratic regression model

The quadratic regression model is pretty similar with the multiple regression. The second independent variable here is the squared value of the 1st variable.

# Run a Quadratic regression model by using lm() function
curvreg_Quadm <- lm(perform~anxiety+anxiety_squared,data=curvreg_data)
summary(curvreg_Quadm)

## 
## Call:
## lm(formula = perform ~ anxiety + anxiety_squared, data = curvreg_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2500 -2.7946  0.5952  2.9405  9.3214 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -13.5714     5.5081  -2.464   0.0315 *  
## anxiety          28.3155     3.1566   8.970 2.17e-06 ***
## anxiety_squared  -3.6131     0.3856  -9.369 1.41e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.998 on 11 degrees of freedom
## Multiple R-squared:  0.8895,	Adjusted R-squared:  0.8694 
## F-statistic: 44.28 on 2 and 11 DF,  p-value: 5.473e-06

Then we add a fit line

# load the ggplot package to use ggplot() function
library(ggplot2)
# Add fit lines
ggplot(curvreg_data, aes(x=anxiety, y=perform)) + geom_point()+stat_smooth(se=F, method='lm', formula=y~poly(x,2))

6.3.3 Cubic regression model

The cubic regression model is still pretty similar with the multiple regression. The third independent variable here is the cubic value of the 1st variable.

# Run a Cubic regression model by using lm() function
curvreg_cubreg <- lm(perform~anxiety+anxiety_squared+anxiety_third,data=curvreg_data)
summary(curvreg_cubreg)

## 
## Call:
## lm(formula = perform ~ anxiety + anxiety_squared + anxiety_third, 
##     data = curvreg_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8333 -2.2113  0.2024  3.0863  8.7381 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)  
## (Intercept)     -10.07143   10.68184  -0.943   0.3680  
## anxiety          24.32937   10.77739   2.257   0.0476 *
## anxiety_squared  -2.44643    3.03087  -0.807   0.4383  
## anxiety_third    -0.09722    0.25035  -0.388   0.7059  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.203 on 10 degrees of freedom
## Multiple R-squared:  0.8912,	Adjusted R-squared:  0.8585 
## F-statistic: 27.29 on 3 and 10 DF,  p-value: 3.944e-05

# Add fit lines
ggplot(curvreg_data, aes(x=anxiety, y=perform)) + geom_point()+stat_smooth(se=F, method='lm', formula=y~poly(x,3))

6.4 Model Comparison

From the previous section, we got three regression models, namely, linear, quadratic and cubic regression model.

# We can use the anova() function to compare three models
anova(curvreg_lm,curvreg_Quadm) # Compare linear with quadratic regression model

## Analysis of Variance Table
## 
## Model 1: perform ~ anxiety
## Model 2: perform ~ anxiety + anxiety_squared
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1     12 2467.98                                  
## 2     11  274.83  1    2193.2 87.779 1.412e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(curvreg_Quadm,curvreg_cubreg) # Compare quadratic with cubic regression model

## Analysis of Variance Table
## 
## Model 1: perform ~ anxiety + anxiety_squared
## Model 2: perform ~ anxiety + anxiety_squared + anxiety_third
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     11 274.83                           
## 2     10 270.75  1    4.0833 0.1508 0.7059

anova(curvreg_lm,curvreg_cubreg) # Compare linear with cubic regression model

## Analysis of Variance Table
## 
## Model 1: perform ~ anxiety
## Model 2: perform ~ anxiety + anxiety_squared + anxiety_third
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1     12 2467.98                                  
## 2     10  270.75  2    2197.2 40.577 1.589e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can also compare these three models in one shot.

anova(curvreg_lm,curvreg_Quadm,curvreg_cubreg)

## Analysis of Variance Table
## 
## Model 1: perform ~ anxiety
## Model 2: perform ~ anxiety + anxiety_squared
## Model 3: perform ~ anxiety + anxiety_squared + anxiety_third
##   Res.Df     RSS Df Sum of Sq       F    Pr(>F)    
## 1     12 2467.98                                   
## 2     11  274.83  1   2193.15 81.0027 4.137e-06 ***
## 3     10  270.75  1      4.08  0.1508    0.7059    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the Model 2 (Quadratic Regression Model) is statistically significant in its RSS, we will choose this model as our final model.

6.5 Supplementary Learning Materials

Another R Example for Curvilinear Regression: Link
An advanced video example of running Polynomial Regression in R: Link