4.2 Least squares estimates

An appropriate estimator which gives the “best” parameter estimates should be considered
Commonly used estimator is least squares estimator which originates from the OLS method (Ordinary Least Squares)
In bivariate regression the OLS method is concerned with finding the regression line that gives the best fit to data points such that the sum of squared residuals is small as possible, hence the term “least squares”

$\begin{equation} \text{min}\sum_{i=1}^n \hat{u}_i^2=\sum_{i=1}^n \bigg( y_i - (\hat{\beta}_0+\hat{\beta}_1 x_i)\bigg)^2 \tag{4.8} \end{equation}$

Differential calculus is used to obtain $\hat{\beta}_0$ and $\hat{\beta}_1$ that minimize the sum of squared residuals (SSR)

$\begin{align} \hat{\beta}_1&=\frac{\displaystyle\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{\displaystyle\sum_{i=1}^n (x_i-\bar{x})^2}=\frac{Cov(x,y)}{S_x^2} \\ \\ \hat{\beta}_0&=\bar{y}-\hat{\beta}_1 \bar{x} \\ \tag{4.9} \end{align}$

Example 4.1 A real estate agent wishes to examine the relationship between the selling price of a house and its size (measured in square meters). A random sample of ten houses is selected ( $n=10$ ). Insert data from table 4.3 to Excel and plot Scattergram. Add a regression line to the same plot. Afterwords, compute the same OLS estimates using functions =INTERCEPT() and =SLOPE(). Give interpretation of both regression coefficients.

TABLE 4.3: House price data
size in $m^2$	price in $1000$ s USD
130	245
148	213
157	279
174	308
102	199
143	219
218	405
227	324
132	319
157	255

Excel instructions: insert the data in Excel manually and select both columns including the variables names. On the Insert tab click XY (Scatter) and select the firs chart subtype that doesn’t include any lines. In the next step right-click any single point inside the chart and select Add Trendline from drop down options. From the Trendline menu check both boxes the Display Equation on Chart and the Display R-Squared Value on Chart (linear equation is default). Afterwords, use functions =INTERCEPT(known_ys;known_xs) and =SLOPE(known_ys;known_xs) to compute OLS estimates for constant and slope. The first argument of both functions is the range of dependent variable, while the second argument is always the range of independent variable.

$\widehat{price}_i=98.115+1.1863\cdot size_i$

The slope $\hat{\beta}_1=1.1863$ tell us that the average value of a house increases by $1.1863(1000)=1186.3$ USD, for each additional one square meter of size
The constant term $\hat{\beta}_0=98.115(1000)=98115$ USD just indicates that $98115$ is the portion of the house price not explained by it’s size
The $R^2$ is commonly used as a goodness of fit measure

size in $m^2$	price in $1000$ s USD
130	245
148	213
157	279
174	308
102	199
143	219
218	405
227	324
132	319
157	255

size in $m^2$	price in $1000$ s USD
130	245
148	213
157	279
174	308
102	199
143	219
218	405
227	324
132	319
157	255

size in $m^2$	price in $1000$ s USD
130	245
148	213
157	279
174	308
102	199
143	219
218	405
227	324
132	319
157	255