4.2 Least squares estimates

  • An appropriate estimator which gives the “best” parameter estimates should be considered

  • Commonly used estimator is least squares estimator which originates from the OLS method (Ordinary Least Squares)

  • In bivariate regression the OLS method is concerned with finding the regression line that gives the best fit to data points such that the sum of squared residuals is small as possible, hence the term “least squares”

\[\begin{equation} \text{min}\sum_{i=1}^n \hat{u}_i^2=\sum_{i=1}^n \bigg( y_i - (\hat{\beta}_0+\hat{\beta}_1 x_i)\bigg)^2 \tag{4.8} \end{equation}\]

  • Differential calculus is used to obtain \(\hat{\beta}_0\) and \(\hat{\beta}_1\) that minimize the sum of squared residuals (SSR)

\[\begin{align} \hat{\beta}_1&=\frac{\displaystyle\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{\displaystyle\sum_{i=1}^n (x_i-\bar{x})^2}=\frac{Cov(x,y)}{S_x^2} \\ \\ \hat{\beta}_0&=\bar{y}-\hat{\beta}_1 \bar{x} \\ \tag{4.9} \end{align}\]

Example 4.1 A real estate agent wishes to examine the relationship between the selling price of a house and its size (measured in square meters). A random sample of ten houses is selected (\(n=10\)). Insert data from table 4.3 to Excel and plot Scattergram. Add a regression line to the same plot. Afterwords, compute the same OLS estimates using functions =INTERCEPT() and =SLOPE(). Give interpretation of both regression coefficients.

TABLE 4.3: House price data
size in \(m^2\) price in \(1000\)s USD
130 245
148 213
157 279
174 308
102 199
143 219
218 405
227 324
132 319
157 255

Excel instructions: insert the data in Excel manually and select both columns including the variables names. On the Insert tab click XY (Scatter) and select the firs chart subtype that doesn’t include any lines. In the next step right-click any single point inside the chart and select Add Trendline from drop down options. From the Trendline menu check both boxes the Display Equation on Chart and the Display R-Squared Value on Chart (linear equation is default). Afterwords, use functions =INTERCEPT(known_ys;known_xs) and =SLOPE(known_ys;known_xs) to compute OLS estimates for constant and slope. The first argument of both functions is the range of dependent variable, while the second argument is always the range of independent variable.

\[\widehat{price}_i=98.115+1.1863\cdot size_i\]

  • The slope \(\hat{\beta}_1=1.1863\) tell us that the average value of a house increases by \(1.1863(1000)=1186.3\) USD, for each additional one square meter of size

  • The constant term \(\hat{\beta}_0=98.115(1000)=98115\) USD just indicates that \(98115\) is the portion of the house price not explained by it’s size

  • The \(R^2\) is commonly used as a goodness of fit measure