5.1 OLS method

  • Based on the following hypothetical example, the OLS method will be discussed in detail, along with the assumptions and properties of the OLS estimator

Hypothetical example: linear dependence between weekly consumption (variable \(y\)) and income (variable \(x\)) is analyzed in a population of 60 households. For the same reason households are divided into \(10\) groups with the same level of weekly income (both variables are measured in USD).

TABLE 5.1: Population households consumption with respect to income level
\(~Income~\) \(~E(y|x_i)~\)
80 55 60 65 70 75 NA NA 65
100 65 70 74 80 85 88 NA 77
120 79 84 90 94 98 NA NA 89
140 80 93 95 103 108 113 115 101
160 102 107 110 116 118 125 NA 113
180 110 115 120 130 135 140 NA 125
200 120 136 140 144 145 NA NA 137
220 135 137 140 152 157 160 162 149
240 137 145 155 165 175 189 NA 161
260 150 152 175 178 180 185 191 173
\(~~~\)
  • Conditional expectation \(E(y|x_i)\) is average weekly consumption conditioned on the income level, i.e. conditional mean is a linear function of variable \(x\) \[\begin{equation} E(y|x_i)=\beta_0+\beta_1x_i \end{equation}\]

  • Inserting values of \(E(y|x_i)\) and \(x_i\) in equation (10) we get the system of \(10\) equations \[65=\beta_0+\beta_1 80\] \[77=\beta_0+\beta_1 100\] \[\vdots\]

  • Conditional conditional means an be easily calculated for each level of income if parameters \(\beta_0\) and \(\beta_1\) were known!

    • -> the conditional distribution of \(y\) is cenetered about the conditional mean
Population regression line

FIGURE 5.1: Population regression line

  • Weekly consumption for every household individually \(y_i\) differs from the conditional mean \[\begin{equation} y_i-E(y|x_i)=u_i~~~~\forall i \end{equation}\]

  • Inserting equation (10) into expression (11) we get \[\begin{equation} y_i=\beta_0+\beta_1x_i+u_i~~~~\forall i \end{equation}\]

  • Disturbance \(u_i\) is an which presents a random variable with zero conditional mean for each observation \[\begin{equation}E(u|x_i)=0~~~~\forall i \end{equation}\]

  • Random variables \(u_i\) and regressors should be independently distributed (\(x\) does not give us any information about \(u\))

  • If variable \(x\) is not random but fixed expected value of vector \(u\) is \[\begin{equation}E(u)=0\end{equation}\]

  • Population is often finite but unknown or infinite (when dealing with time-series data) so the error terms \(u_i\) are also uknown as well as parameters \(\beta_0\) and \(\beta_1\). However, error terms and parameters can be easily of size \(n\).

  • Selection of simple random sample requires that error terms are (with zero mean and constant variance) and (zero covariances)