4.2 Least squares estimates
An appropriate estimator which gives the “best” parameter estimates should be considered
Commonly used estimator is least squares estimator which originates from the OLS method (Ordinary Least Squares)
In bivariate regression the OLS method is concerned with finding the regression line that gives the best fit to data points such that the sum of squared residuals is small as possible, hence the term “least squares”
\[\begin{equation} \text{min}\sum_{i=1}^n \hat{u}_i^2=\sum_{i=1}^n \bigg( y_i - (\hat{\beta}_0+\hat{\beta}_1 x_i)\bigg)^2 \tag{4.8} \end{equation}\]
- Differential calculus is used to obtain \(\hat{\beta}_0\) and \(\hat{\beta}_1\) that minimize the sum of squared residuals (SSR)
\[\begin{align} \hat{\beta}_1&=\frac{\displaystyle\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{\displaystyle\sum_{i=1}^n (x_i-\bar{x})^2}=\frac{Cov(x,y)}{S_x^2} \\ \\ \hat{\beta}_0&=\bar{y}-\hat{\beta}_1 \bar{x} \\ \tag{4.9} \end{align}\]
Example 4.1 A real estate agent wishes to examine the relationship between the selling price of a house and its size (measured in square meters). A random sample of ten houses is selected (\(n=10\)). Insert data from table 4.3 to Excel and plot Scattergram
. Add a regression line to the same plot. Afterwords, compute the same OLS estimates using functions =INTERCEPT()
and =SLOPE()
. Give interpretation of both regression coefficients.
size in \(m^2\) | price in \(1000\)s USD |
---|---|
130 | 245 |
148 | 213 |
157 | 279 |
174 | 308 |
102 | 199 |
143 | 219 |
218 | 405 |
227 | 324 |
132 | 319 |
157 | 255 |
Excel instructions: insert the data in Excel manually and select both columns including the variables names. On the Insert
tab click XY (Scatter)
and select the firs chart subtype that doesn’t include any lines. In the next step right-click any single point inside the chart and select Add Trendline
from drop down options. From the Trendline
menu check both boxes the Display Equation on Chart
and the Display R-Squared Value on Chart
(linear equation is default). Afterwords, use functions =INTERCEPT(known_ys;known_xs)
and =SLOPE(known_ys;known_xs)
to compute OLS estimates for constant and slope. The first argument of both functions is the range of dependent variable, while the second argument is always the range of independent variable.
\[\widehat{price}_i=98.115+1.1863\cdot size_i\]
The slope \(\hat{\beta}_1=1.1863\) tell us that the average value of a house increases by \(1.1863(1000)=1186.3\) USD, for each additional one square meter of size
The constant term \(\hat{\beta}_0=98.115(1000)=98115\) USD just indicates that \(98115\) is the portion of the house price not explained by it’s size
The \(R^2\) is commonly used as a goodness of fit measure