4.1 Sample regression line

  • Let consider an hypothetical example where linear dependence between consumption and income on weekly basis is analyzed in a population of 60 households. Households are divided into 10 groups with the same level of income as presented in table 4.1.
TABLE 4.1: Weekly consumption of 60 households with respect to income level
Income (X) E(Y|X)
80 55 60 65 70 75 65
100 65 70 74 80 85 88 77
120 79 84 90 94 98 89
140 80 93 95 103 108 113 115 101
160 102 107 110 116 118 125 113
180 110 115 120 130 135 140 125
200 120 136 140 144 145 137
220 135 137 140 152 157 160 162 149
240 137 145 155 165 175 189 161
260 150 152 175 178 180 185 191 173
  • An average weekly consumption is given by the income level, i.e. the conditional mean (we often say conditional expectation) is a linear function of variable X

E(Y|X)=β0+β1X

  • Inserting values from the first and the last column of table 4.1 in equation (4.2) we get

65=β0+β18077=β0+β1100       173=β0+β1260

  • From any two equations of the system (4.3) it’s easy to find two population parameters

β0=17        β1=0.6

  • Conditional expectations are therefore easily computed for each level of income when population parameters are known
Population regression line

FIGURE 4.1: Population regression line

  • Regression line passes through the conditional expectations of variable Y for every level of income X

  • However, weekly consumption for every household individually yi differs from the conditional expectation

yiE(Y|xi)=ui

  • Finally, population regression line is

yi=β0+β1xiE(Y|xi)+ui

  • Disturbances ui are error terms which represent random variables with zero conditional mean for every level of income

E(u|xi)=0

  • Condition (4.6) indicates that variable X does not give us any information about error term u, i.e. variables X and u are independent

  • Population is often finite but unknown or infinite, so the error terms ui are also uknown as well as parameters β0 and β1

  • Parameters and error terms can be estimated from the sample and thus a sample regression line is

yi=ˆβ0+ˆβ1xiˆyi+ˆui

  • Different random samples can be chosen when selecting one household from each group of income level
TABLE 4.2: Two random samples of households with respect to income and consumption
xi yi
80 70
100 65
120 90
140 95
160 110
180 115
200 120
220 140
240 155
260 150
xi yi
80 60
100 80
120 79
140 108
160 116
180 120
200 144
220 152
240 145
260 178
Two sample regression lines

FIGURE 4.2: Two sample regression lines

  • Selection of simple random sample requires that error terms are identically distributed (with zero mean and constant variance) and independently distributed (zero covariances)

  • Estimated error terms are called residuals and usually denoted as ˆui

  • One residual is the difference between the actual value yi observed from the sample and the fitted or estimated value ˆyi on regression line