This book is in Open Review. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click the on the pop-up menu. You can also see the annotations of others: click the in the upper right hand corner of the page

4.6 Exercises

1. Class Sizes and Test Scores

A researcher wants to analyze the relationship between class size (measured by the student-teacher ratio) and the average test score. Therefore he measures both variables in \(10\) different classes and ends up with the following results.

Class Size 23 19 30 22 23 29 35 36 33 25
Test Score 430 430 333 410 390 377 325 310 328 375

Instructions:

  • Create the vectors cs (the class size) and ts (the test score), containing the observations above.

  • Draw a scatterplot of the results using plot().

2. Mean, Variance, Covariance and Correlation

The vectors cs and ts are available in the working environment (you can check this: type their names into the console and press enter).

Instructions:

  • Compute the mean, the sample variance and the sample standard deviation of ts.

  • Compute the covariance and the correlation coefficient for ts and cs.

Hint: Use the R functions presented in this chapter: mean(), sd(), cov(), cor() and var().

3. Simple Linear Regression

The vectors cs and ts are available in the working environment.

Instructions:

  • The function lm() is part of the package AER. Attach the package using library().

  • Use lm() to estimate the regression model \[TestScore_i = \beta_0 + \beta_1 STR_i + u_i.\] Assign the result to mod.

  • Obtain a statistical summary of the model.

4. The Model Object

Let us see how an object of class lm is structured.

The vectors cs and ts as well as the model object mod from the previous exercise are available in your workspace.

Instructions:

  • Use class() to learn about the class of the object mod.
  • mod is an object of type list with named entries. Check this using the function is.list().
  • See what information you can obtain from mod using names().
  • Read out an arbitrary entry of the object mod using the $ operator.

5. Plotting the Regression Line

You are provided with the code for the scatterplot in script.R

Instructions:

  • Add the regression line to the scatterplot from a few exercises before.

  • The object mod is available in your working environment.

Hint: Use the function abline().

6. Summary of a Model Object

Now read out and store some of the information that is contained in the output of summary().

Instructions:

  • Assign the output of summary(mod) to the variable s.

  • Check entry names of the object s.

  • Create a new variable R2 and assign the \(R^2\) of the regression.

The object mod is available in your working environment.

7. Estimated Coefficients

The function summary() also provides information on the statistical significance of the estimated coefficients.

Instructions:

Extract the named \(2\times4\) matrix with estimated coefficients, standard errors, \(t\)-statistics and corresponding \(p\)-values from the model summary s. Save this matrix in an object named coefs.

The objects mod and s are available in your working environment.

8. Dropping the Intercept

So far, we have estimated regression models consisting of an intercept and a single regressor. In this exercise you will learn how to specify and how to estimate regression a model without intercept.

Note that excluding the intercept from a regression model might be a dodgy practice in some applications as this imposes the conditional expectation function of the dependent variable to be zero if the regressor is zero.

Instructions:

  • Figure out how the formula argument must be specified for a regression of ts solely on cs, i.e., a regression without intercept. Google is your friend!

  • Estimate the regression model without intercept and store the result in mod_ni.

The vectors cs, ts and the model object mod from previous exercises are available in the working environment.

9. Regression Output: No Constant Case

In Exercise 8 you have estimated a model without intercept. The estimated regression function is

\[\widehat{TestScore} = \underset{(1.36)}{12.65} \times STR.\]

Instructions:

Convince yourself that everything is as stated above: extract the coefficient matrix from the summary of mod_ni and store it in a variable named coef.

The vectors cs, ts as well as the model object mod_ni from the previous exercise are available in your working environment.

Hint: An entry of a named list can be accessed using the $ operator.

10. Regression Output: No Constant Case — Ctd.

In Exercises 8 and 9 you have dealt with a model without intercept. The estimated regression function was

\[\widehat{TestScore_i} = \underset{(1.36)}{12.65} \times STR_i.\]

The coefficient matrix coef from Exercise 9 contains the estimated coefficient on \(STR\), its standard error, the \(t\)-statistic of the significance test and the corresponding \(p\)-value.

Instructions:

  • Print the contents of coef to the console.
  • Convince yourself that the reported \(t\)-statistic is correct: use the entries of coef to compute the \(t\)-statistic and save it to t_stat.

The matrix coef from the previous exercise is available in your working environment.

Hints:

  • X[a,b] returns the [a,b] element of the matrix X.

  • The \(t\)-statistic for a test of the hypothesis \(H_0: \beta_1 = 0\) is computed as \[t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)}.\]

11. Two Regressions, One Plot

The two estimated regression models from the previous exercises are

\[\widehat{TestScore_i} = \underset{(1.36)}{12.65} \times STR_i\]

and

\[\widehat{TestScore_i} = \underset{(23.96)}{567.4272} \underset{(0.85)}{-7.1501} \times STR_i.\]

You are provided with the code line plot(cs, ts) which creates a scatterplot of ts and cs. Note that this line must be executed before calling abline()! You may color the regression lines by using, e.g., col = “red” or col = “blue” as an additional argument to abline() for better distinguishability.

The vectors cs and ts as well as the list objects mod and mod_ni from previous exercises are availabe in your working environment.

Instructions:

Generate a scatterplot of ts and cs and add the estimated regression lines of mod and mod_ni.

12. \(TSS\) and \(SSR\)

If graphical inspection does not help, researchers resort to analytic techniques in order to detect if a model fits the data at hand good or better than another model.

Let us go back to the simple regression model including an intercept. The estimated regression line for mod was

\[\widehat{TestScore_i} = 567.43 - 7.15 \times STR_i, \, R^2 = 0.8976, \, SER=15.19.\]

You can check this as mod and the vectors cs and ts are available in your working environment.

Instructions:

  • Compute \(SSR\), the sum of squared residuals, and save it to ssr.
  • Compute \(TSS\), the total sum of squares, and save it to tss.

13. The \(R^2\) of a Regression Model

The \(R^2\) of the regression saved in mod is \(0.8976\). You can check this by executing summary(mod)$r.squared in the console below.

Remember the formula of \(R^2\):

\[R^2 = \frac{ESS}{TSS} = 1 - \frac{SSR}{TSS}\]

The objects mod, tss and ssr from the previous exercise are available in your working environment.

Instructions:

  • Use ssr and tss to compute \(R^2\) manually. Round the result to four decimal places and save it to R2.
  • Use the logical operator == to check whether your result matches the value mentioned above.

Hints:

You may round numeric values using the function round().

14. The Standard Error of The Regression

The standard error of the Regression in the simple regression model is \[SER = \frac{1}{n-2} \sum_{i=1}^n \widehat{u}_i^2 =\sqrt{\frac{SSR}{n-2}}.\] \(SER\) measures the size of an average residual which is an estimate of the magnitude of a typical regression error.

The model object mod and the vectors cs and ts are available in your workspace.

Instructions:

  • Use summary() to obtain the \(SER\) for the regression of ts on cs saved in the model object mod. Save the result in the variable SER.

  • Use SER to compute the \(SSR\) and store it in SSR.

  • Check that SSR is indeed the \(SSR\) by comparing SSR to the result of sum(mod$residuals^2)

15. The Estimated Covariance Matrix

As has been discussed in Chapter 4.4, the OLS estimators \(\widehat{\beta}_0\) and \(\widehat{\beta}_1\) are functions of the random error term. Therefore, they are random variables themselves. For two or more random variables, their covariances and variances are summarized by a variance-covariance matrix (which is often simply called the covariance matrix). Taking the square root of the diagonal elements of the estimated covariance matrix obtains \(SE(\widehat\beta_0)\) and \(SE(\widehat\beta_1)\), the standard errors of \(\widehat{\beta}_0\) and \(\widehat{\beta}_1\).

summary() computes an estimate of this matrix. The respective entry in the output of summary (remember that summary() produces a list) is called cov.unscaled. The model object mod is available in your workspace.

Instructions:

  • Use summary() to obtain the covariance matrix estimate for the regression of test scores on student-teacher ratios stored in the model object mod. Save the result to cov_matrix.

  • Obtain the diagonal elements of cov_matrix, compute their square root and assign the result to the variable SEs.

Hint: diag(A) returns a vector containing the diagonal elements of the matrix A.