43 Assignment 4

Due 11:59pm Wednesday July 17

You may choose to complete the assignment working closely with another student in this class. If you choose to do this, you only need to submit one joint assignment (i.e., one file for both of you). Please make sure to put both of your names on the assignment.

Although you may complete this assignment jointly, you may participate in a learning group of any size to complete the assignment, but please only submit your own work (i.e., do not copy another group or persons work).

Please save the file as Yourlastnames_Assignment4 and upload it to Canvas. Your two-person group only needs to submit a single file, but make both of your names are listed. Make sure to show all of your work and computer code, so that your mathematical and numerical results are easily reproducible. Also, please note that about 15% of this assignment grade will involve making the document you submit look professional (e.g., no typos, plots and figures that are easy to read and have labels, no awkward or sloppy formatting, etc).

  1. Read the two pages from Wood (2006). Using the Hubble data (see code below), answer the following questions.
library(gamair)
data(hubble)

# Figure 1.1 from Wood 2006
plot(hubble$x,hubble$y,xlab="Distance (Mpc)",
     ylab=expression("Velocity (km"*s^{-1}*")"))
(a) Fit a linear model that corresponds to Hubble's Law. Estimate all parameter(s) using maximum likelihood estimation. Please show all relevant R code.
(b) Using the parameter estimates from part a, what is the most likely age of the universe in years? Note that distance is measured in Mega parsecs. A Mega parsecs is 3.09x10^19 km.
(c) Calculate a 95% confidence interval for the estimated age from part b.
(d) Explain how to interpret the 95% confidence interval from part c.
  1. In the R code below, I provide a data set that I obtained from the National Oceanic and Atmospheric Administration (NOAA). The documentation is available here (link). Please write a short 4-5 sentence description of the data (e.g., Where were the data collected? What is being measured?).
library(lubridate)
url <- "https://www.dropbox.com/s/asw6gtq7pp1h0bx/manhattan_temp_data.csv?dl=1"
df.temp <- read.csv(url)
df.temp$DATE <- ymd(df.temp$DATE)
df.temp$days <- df.temp$DATE - min(df.temp$DATE)
plot(df.temp$DATE,df.temp$TOBS,xlab="Date",ylab="Temperature")
  1. I have always wondered if the temperature in Manhattan has increased over a long time period (i.e., perhaps a local effect of global climate change). Write out a linear model that enables you to determine if the temperature has increased over a long time period. Make sure to explain each component of your linear model and include distributional assumptions (e.g., \(\varepsilon_i~N(0,\sigma^2)\)).

  2. For the linear model you wrote out in question 3, estimate and quantify uncertainty for the parameters by fitting this model to the data from question 2.Please write 3-5 sentences explaining your estimates.

  3. Using the idea of probabilistic thinking and the linear model you used in question 4, predict the temperature for January 1, 2050. Please write 3-5 sentences explaining your prediction.

  4. Using the idea of probabilistic thinking how much has the temperature changed in the past 100 years (i.e., between the dates of January 1, 1924 to January 1, 2024)? Please write 3-5 sentences explaining your results.

  5. The linear model used to answer questions requires the use of data and assumptions to enable estimation, prediction, and statistical inference. Write a list that contains all known assumptions about the data and the model that are required to obtain reliable estimation, prediction, and statistical inference. For each item on your list provide a short (e.g., 1 sentence) explanation of the assumption.