42 Assignment 3

Due 11:59pm Tuesday June 20

You may choose to complete the assignment working closely with another student in this class. If you choose to do this, you only need to submit one joint assignment (i.e., one file for both of you). Please make sure to put both of your names on the assignment.

Although you may complete this assignment jointly, you may participate in a learning group of any size to complete the assignment, but please only submit your own work (i.e., do not copy another group or persons work).

Please save the file as Yourlastnames_Assignment3 and upload it to Canvas. Your two-person group only needs to submit a single file, but make both of your names are listed. Make sure to show all of your work and computer code, so that your mathematical and numerical results are easily reproducible. Also, please note that about 15% of this assignment grade will involve making the document you submit look professional (e.g., no typos, plots and figures that are easy to read and have labels, no awkward or sloppy formatting, etc).

  1. For the observed response \(\mathbf{y}\) and model matrix \(\mathbf{X}\) below, calculate an estimate of \(\boldsymbol{\beta}\equiv(\beta_0,\beta_1)\) that minimizes \(\sum_{i=1}^{n}(y_i-\mathbf{x}_{i}^{\prime}\boldsymbol{\beta})^2\) by hand (showing each step) and not using a computer. You may include you work by typesetting each steps using mathematical notation or by including a photo of your hand written work.

\[\mathbf{y}=(6.6,2.2,-1.1)^{\prime}\] \[\mathbf{X}=\left[\begin{array}{cc} 1 & 2\\ 1 & 5\\ 1 & 6 \end{array}\right]\]

  1. In the R code below, I provide a data set that I obtained from the National Oceanic and Atmospheric Administration (NOAA). The documentation is available here (link). Please write a short 4-5 sentence description of the data (e.g., Where were the data collected? What is being measured?).
library(lubridate)
url <- "https://www.dropbox.com/s/asw6gtq7pp1h0bx/manhattan_temp_data.csv?dl=1"
df.temp <- read.csv(url)
df.temp$DATE <- ymd(df.temp$DATE)
df.temp$days <- df.temp$DATE - min(df.temp$DATE)
plot(df.temp$DATE,df.temp$TOBS,xlab="Date",ylab="Temperature")
  1. I have always wondered if the temperature in Manhattan has increased over a long time period (i.e., perhaps a local effect of global climate change). Write out a linear model that enables you to determine if the temperature has increased over a long time period. Make sure to explain each component of your linear model.

  2. Estimate the parameters for the linear model you wrote out in question 3 by fitting this model to the data from question 2. You may use whatever loss function you think is most appropriate.

  3. Using the same data set from problem 2, estimate \(\boldsymbol{\beta}\) in the linear model \(\mathbf{y}=\beta_{0}+\beta_{1}\mathbf{x}+\boldsymbol{\varepsilon}\), where \(\mathbf{y}\) is the temperature at the time of observation and and \(\mathbf{x}\) is the days since the first record.

    1. Estimate \(\boldsymbol{\beta}\) using the squared distance as a measure of discrepancy between the data and the model.
    2. Estimate \(\boldsymbol{\beta}\) using the absolute distance as a measure of discrepancy between the data and the model.
    3. Plot both “lines of best” fit from parts a and b along with the data.
    4. Calculate the expected temperature for the year 2050 using the estimate of \(\boldsymbol{\beta}\) obtained from parts a and b.
    5. In 3-5 sentences, explain how your predictions of temperature for the year 2050 use or fail to use the idea of “probabilistic thinking.”