31 Assignment 5

Assignment 5 is to be completed individually or with a partner (i.e., the maximum group size is two). Please submit the assignment as a single pdf or html file (one file per individual/group). Save the file as Yourlastname_Assignment5 (e.g., Hefley_Assignment5). Make sure to show your work in R to ensure that I can reproduce your results (e.g., figures, calculations, etc). Upload your complete answers to problems 1-6 to Canvas before 11:59 pm on Friday 11/20/20.

31.1 Motivation

The purpose of this assignment is to give you practical experience with the analysis of spatio-temporal data. One common type of spatio-temporal data is disease data. Typically disease data are collected by sampling a host species for the presence or absence of a disease. Thus, there is interest in conducting statistical analysis on the abundance of the host and and the presence/absence of the disease.

31.2 Data

Enders et al. (2018) collected data on the abundance English grain aphid at 341 unique spatial location within the state of Kansas in the year 2014 and 2015. As discussed in class, the English grain aphid is a vector for barley yellow dwarf virus (BYVD). At 199 of the spatial locations, groups of up to ten English grain aphid were tested for BYVD. The spatio-temporal data set that contains the number of English grain aphids, the presence or absence of BYVD within the group, and the number of individuals within each tested group can be downloaded in the R code below.

31.3 Goal

You will conduct a statistical analysis of the English grain aphid and BYVD data. There are two goals to the analysis. The first goal is to build a statistical model that makes accurate predictions of the abundance of English grain aphids at any date and location within the state of Kansas. The second goal is to build a spatio-temporal statistical model that enables inference regarding if the land cover type influences the probability that an individual English grain aphid will have be infected with BYVD. Note that the data you have is on BYVD presence or absence for groups of English grain aphid.

31.4 Problems

  1. For the data on the abundance of English grain aphids, propose two different statistical models that are capable of predicting the number of English grain aphids at any location within the state of Kansas at any time within the period 2014-2015. Make sure to write out the two statistical models using formal notation and fully describe each component using words.

  2. Fit the two statistical models you proposed in question #1 to the English grain aphid abundance data.

  3. For the two models you fit in question #2, measure the accuracy of predictions. Which model makes the most accurate predictions? Make sure you fully describe how you plan to compare the predictive accuracy. Note that you are not allowed to use information criteria (e.g., AIC) to answer this question.

  4. For the BYVD data, propose and fit a statistical model that enables predictions of the probability that a group of English grain aphids is positive for BYVD. Please use the percentage of grassland within 5000 m of the sample location as a predictor variable.

  5. For the BYVD data, propose and fit a statistical model that enables predictions of the probability that a individual English grain aphid is positive for BYVD. Please use the percentage of grassland within 5000 m of the sample site as a predictor variable.Vansteelandt et al. (2000) shows that when the predictor variables are the same for all individuals withing a group that are tested for a disease, that you can use a complementary log-log (cloglog) link function to obtain individual-level inference. The complementary log-log link function was used in the supporting material of Enders et al. (2018) on pg. 4 (supporting material).

  6. Does the percentage of grassland within 5000 m of the sample site increase the probability of BYVD infection in the English grain aphid? How does the group-level inference obtained in question #4 compare to the individual-level inference obtained in question #5?