R Exercise Week 4

Task: Build on your previous exercises and plot the sites on a map downloaded from the internet. Explore the relationships between Hexp, census population size and percent forest cover within 500 m of the site (forest may act as a barrier for grassland plants).

Hints:

  1. Load packages: You may want to load the packages dplyr and tmap. Alternatively, you can use :: to call functions from packages.

  2. Import your datasets from Weeks 2 & 3 R Exercises. Here’s an example for your code, adapt it as needed to import the R objects “Pulsatilla.longlat.rds” (sf object, Week 2) and “H.pop.rds” (Week 3) that you saved previously: Pulsatilla.longlat <- readRDS(here::here("output/Pulsatilla.longlat.rds"))

  3. Plot sites on map from internet: adapt the code from section 3.d to plot the sampling locations on a background map from the internet. Next, modify code from section 3.d to add labels for all sites.

  4. Combine data: Use the function dplyr::left_join to add the variables from the dataset H.pop to Pulsatilla.longlat. Notes:

    • This is important, as the order of populations may not be the same in the two datasets.
    • Remember to check the structure of the datasets (variable names and types) first so that you know which are the ID variables that you can use to match sites.
    • If the two ID variables are not of the same type (e.g., one if a factor, the other is character), it is best to change the format of one (e.g., with as.character) before doing the left-join.
  5. Scatterplot with regression line: Create a scatterplot of Hexp (y axis) plotted against nIndiv (x axis). Add a regression line and, if possible, label points. You may modify code from section 3.b or use base R functions.

  6. Regression analysis: Adapt code from section 3.c to perform a regression of Hexp (response variable) on the predictor nIndiv. Create residual plots and inspect them. What is the main issue here?

Questions: There is one influential point in the regression analysis:

  • Which site is it?
  • Where is it located (north / east / south / west)?
  • What makes it an influential point (large residual, leverage, or both)?
  • What would happen to the regression line and the R-squared if this point was omitted?