Regression models with factors and interactions

Suppose we are in the scenario where we have a response variable, a continuous explanatory variable and a binary explanatory variables

We have already discussed three models that can be expressed as:

  1. A collection of different regression lines (a model which includes an interaction with a factor),
  2. A collection of parallel regression lines,
  3. A single regression line (with no differences among the groups).

using the notation:

yij : response observation j in group i

xij : explanatory variable observation j in group i

ni : sample size in group i

p : number of groups

n : pi=1ni , total sample size.

The most general model (1), could be formulated as

E(yij)=αi+βixij

as group i has its own slope and intercept.

In previous weeks we have found that the formulation in terms of (xiˉx) led to simpler algebra. This is also true in the present case and we will therefore formulate the model as

E(yij)=αi+βi(xijˉxi.)

or

yij=αi+βi(xijˉxi.)+ϵij

where ˉxi.=1ninij=1xij, the mean of the explanatory variable, xij for group i.

The models of interest can be expressed as:

  1. different lines: E(Yij)=αi+βi(xijˉxi.)
  2. parallel lines: E(Yij)=αi+β(xijˉxi.)
  3. single line: E(Yij)=α+β(xijˉx..), where ˉx.. is the mean of all xij.

Note that the “single line” model is written in terms of (xijˉx..) so that there are no differences among the groups.

We will now consider how to select an `appropriate’ model given such data. We will begin with the most general model with different regression lines for each group. Let’s assume that i=2 such that we have two groups.