Regression models with factors and interactions
Suppose we are in the scenario where we have a response variable, a continuous explanatory variable and a binary explanatory variables
We have already discussed three models that can be expressed as:
- A collection of different regression lines (a model which includes an interaction with a factor),
- A collection of parallel regression lines,
- A single regression line (with no differences among the groups).
using the notation:
yij : response observation j in group i
xij : explanatory variable observation j in group i
ni : sample size in group i
p : number of groups
n : ∑pi=1ni , total sample size.
The most general model (1), could be formulated as
E(yij)=αi+βixij
as group i has its own slope and intercept.
In previous weeks we have found that the formulation in terms of (xi−ˉx) led to simpler algebra. This is also true in the present case and we will therefore formulate the model as
E(yij)=αi+βi(xij−ˉxi.)
or
yij=αi+βi(xij−ˉxi.)+ϵij
where ˉxi.=1ni∑nij=1xij, the mean of the explanatory variable, xij for group i.
The models of interest can be expressed as:
- different lines: E(Yij)=αi+βi(xij−ˉxi.)
- parallel lines: E(Yij)=αi+β(xij−ˉxi.)
- single line: E(Yij)=α+β(xij−ˉx..), where ˉx.. is the mean of all xij.
Note that the “single line” model is written in terms of (xij−ˉx..) so that there are no differences among the groups.
We will now consider how to select an `appropriate’ model given such data. We will begin with the most general model with different regression lines for each group. Let’s assume that i=2 such that we have two groups.