7.4 R Exercise Week 4
Task: Build on your previous exercises and plot the sites on a map downloaded from the internet. Explore the relationships between Hexp, census population size and percent forest cover within 500 m of the site (forest may act as a barrier for grassland plants).
Hints:
Load packages: You may want to load the packages
dplyr
andtmap
. Alternatively, you can use::
to call functions from packages.Import your datasets from Weeks 2 & 3 R Exercises. Here’s an example for your code, adapt it as needed to import the R objects “Pulsatilla.longlat.rds” (
sf
object, Week 2) and “H.pop.rds” (Week 3) that you saved previously:Pulsatilla.longlat <- readRDS(here::here("output/Pulsatilla.longlat.rds"))
Plot sites on map from internet: adapt the code from section 3.d to plot the sampling locations on a background map from the internet. Next, modify code from section 3.d to add labels for all sites.
Combine data: Use the function
dplyr::left_join
to add the variables from the datasetH.pop
toPulsatilla.longlat
. Notes:- This is important, as the order of populations may not be the same in the two datasets.
- Remember to check the structure of the datasets (variable names and types) first so that you know which are the ID variables that you can use to match sites.
- If the two ID variables are not of the same type (e.g., one if a
factor
, the other ischaracter
), it is best to change the format of one (e.g., withas.character
) before doing the left-join.
Scatterplot with regression line: Create a scatterplot of
Hexp
(y axis) plotted againstnIndiv
(x axis). Add a regression line and, if possible, label points. You may modify code from section 3.b or use base R functions.
Regression analysis: Adapt code from section 3.c to perform a regression of
Hexp
(response variable) on the predictornIndiv
. Create residual plots and inspect them. What is the main issue here?
Questions: There is one influential point in the regression analysis:
- Which site is it?
- Where is it located (north / east / south / west)?
- What makes it an influential point (large residual, leverage, or both)?
- What would happen to the regression line and the R-squared if this point was omitted?