4.1 Sample regression line
- Let consider an hypothetical example where linear dependence between consumption and income on weekly basis is analyzed in a population of 60 households. Households are divided into 10 groups with the same level of income as presented in table 4.1.
Income (X) | E(Y|X) | |||||||
---|---|---|---|---|---|---|---|---|
80 | 55 | 60 | 65 | 70 | 75 | 65 | ||
100 | 65 | 70 | 74 | 80 | 85 | 88 | 77 | |
120 | 79 | 84 | 90 | 94 | 98 | 89 | ||
140 | 80 | 93 | 95 | 103 | 108 | 113 | 115 | 101 |
160 | 102 | 107 | 110 | 116 | 118 | 125 | 113 | |
180 | 110 | 115 | 120 | 130 | 135 | 140 | 125 | |
200 | 120 | 136 | 140 | 144 | 145 | 137 | ||
220 | 135 | 137 | 140 | 152 | 157 | 160 | 162 | 149 |
240 | 137 | 145 | 155 | 165 | 175 | 189 | 161 | |
260 | 150 | 152 | 175 | 178 | 180 | 185 | 191 | 173 |
- An average weekly consumption is given by the income level, i.e. the conditional mean (we often say conditional expectation) is a linear function of variable X
E(Y|X)=β0+β1X
65=β0+β18077=β0+β1100 ⋮173=β0+β1260
- From any two equations of the system (4.3) it’s easy to find two population parameters
β0=17 β1=0.6
- Conditional expectations are therefore easily computed for each level of income when population parameters are known

FIGURE 4.1: Population regression line
Regression line passes through the conditional expectations of variable Y for every level of income X
However, weekly consumption for every household individually yi differs from the conditional expectation
yi−E(Y|xi)=ui
- Finally, population regression line is
yi=β0+β1xi⏟E(Y|xi)+ui
- Disturbances ui are error terms which represent random variables with zero conditional mean for every level of income
E(u|xi)=0
Condition (4.6) indicates that variable X does not give us any information about error term u, i.e. variables X and u are independent
Population is often finite but unknown or infinite, so the error terms ui are also uknown as well as parameters β0 and β1
Parameters and error terms can be estimated from the sample and thus a sample regression line is
yi=ˆβ0+ˆβ1xi⏟ˆyi+ˆui
- Different random samples can be chosen when selecting one household from each group of income level
|
|

FIGURE 4.2: Two sample regression lines
Selection of simple random sample requires that error terms are identically distributed (with zero mean and constant variance) and independently distributed (zero covariances)
Estimated error terms are called residuals and usually denoted as ˆui
One residual is the difference between the actual value yi observed from the sample and the fitted or estimated value ˆyi on regression line