Example

Looking again at the anova table for protein in pregnancy,

fit1<-lm(formula = Protein ~ Gestation, data=pregnancy)
anova(fit1)
## Analysis of Variance Table
## 
## Response: Protein
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## Gestation  1 0.63667 0.63667  48.076 2.416e-06 ***
## Residuals 17 0.22513 0.01324                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can reject the null hypothesis with a p-value of \(2.4 \times 10^{-6}\) suggesting that at least one model parameter is not zero.

\(\newline\) Let’s now re-vist an example detailing the relationship between a repsonse variables and plotted below.

Let’s now look at some summaries and regression model output from these data

summary(data$Response)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1.8000  0.5683  0.8590  1.1172  1.6358  3.4230
tapply(data$Response, data$Month, mean)
##          1          2          3 
## -0.1523333  1.6496667  1.6700000

The average value of was 1.12 with month averages of -0.152, 1.650 and 1.670 for January, February and March respectively. We want to know if there are any significant differences in across the three months.

model<-lm(Response ~ factor(Month), data=data)
summary(model)
## 
## Call:
## lm(formula = Response ~ factor(Month), data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.64767 -1.00800 -0.07733  0.83308  1.75300 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept)     -0.1523     0.7529  -0.202    0.845
## factor(Month)2   1.8020     1.0648   1.692    0.134
## factor(Month)3   1.8223     0.9960   1.830    0.110
## 
## Residual standard error: 1.304 on 7 degrees of freedom
## Multiple R-squared:  0.3672, Adjusted R-squared:  0.1864 
## F-statistic: 2.031 on 2 and 7 DF,  p-value: 0.2016

The intercept term is in fact the average values of our baseline category January. The estimate 1.802 corresponds to the difference between Month 2, February, and January. Therefore, the average balue in February is \(1.802-0.152 = 1.65\). Likewise, the average value in March is \(1.822 - 0.153=1.67\).

We can now interpret p-values corresponding to each estimate. The p-value of 0.85 correspoding the the intercept terms tells us the that average value in January is not significantly different from 0. More interestingly, the p-value of 0.13 corresponsinf to Month2 tells us that this coefficient is not signficantly different from 0 and so Febraruy is not signficantly different from January. The p-value of 0.11 corresponding to Month3 tells us that March is not signficantly different from January

anova(model)
## Analysis of Variance Table
## 
## Response: Response
##               Df  Sum Sq Mean Sq F value Pr(>F)
## factor(Month)  2  6.9081  3.4540  2.0311 0.2016
## Residuals      7 11.9039  1.7006

The null hypothesis of this anova is that all regression coefficients in our fitted model are equal to 0. The alternative hypothesis is that at least one is not equal to zero. Given a p-value of 0.2, this suggests we cannot reject the null hypothesis. This means that all regression coefficients are zero. In particular, this implies that there is no signficant difference between the three months. In other words, in this case with one categorical variable, the anova is testing for a difference between categories.

The anova reports one p-value that test for at least coefficient to be significantly different from zero. A regression reports one mean (as the intercept intercept) and the differences between that one and all other means, but the p-values evaluate those specific comparisons.