R Exercise Week 2

Task: Create a bubble plot of the number of genotyped individuals in the dataset pulsatilla_genotypes.csv, using Latitude/Longitude coordinates.


  1. Load libraries: Load libraries gstudio, dplyr, tibble and sf.
  2. Import data: Re-use your code from Week 1 exercise to import the dataset pulsatilla_genotypes.csv into gstudio. Recall that the resulting object is a data.frame. Check the variables with function str. Which variables contain the sites and the spatial coordinates?
  3. Summarize by site: Use the function group_by from library dplyr to group individuals (rows) by site (using pipe notation: %>%), and add the function summarize to count the number of genotyped individuals per population (i.e., sampling site). Recall that this can be done with nesting the function n within summarize:
    summarize(nIndiv = n()).
    Write the result into a new object Pulsatilla.
  4. Add mean coordinates: You can nest multiple functions within summarize and separate them with a comma. E.g., to calculate both sample size and the mean of a variable myVar, you could write:
    summarize(nIndiv = n(), myMean = n(myVar))
    Modify your code to calculate the number of genotyped individuals for each site and their mean X and Y coordinates. Your object ‘Pulsatilla’ should now have three columns, one with the number of individuals and two with the mean coordinates. Display the dataset with as_tibble to check.
  5. Convert to sf object: Modify code from section 2.a to convert your data frame Pulsatilla to an sf object. Make sure to adjust the variable names for the coordinates (i.e., use the variable names that you assigned in the previous step for the mean X and Y coordinates).
  6. Specify known projection: The correct EPSG number for this dataset is: 31468. You can specify the CRS with: st_crs(Pulsatilla) <- 31468.
  7. Transform projection: Adapt code from section 2.c to transform the projection to the “longlat” coordinate system, and write it into an object Pulsatilla.longlat.
  8. Create bubble plot: Adapt code from section 4.d to create a bubble plot of the number of individuals per population. Note: you may drop the argument key.entries as it has a default.
  9. Save data as R object: Save the object Pulsatilla.longlat as an R object using the following code:
    saveRDS(Pulsatilla.longlat, file = here::here("output/Pulsatilla.longlat.rds")).
    We will need it for a later R exercise.

Question: Where on earth are the sites in the Pulsatilla dataset located?