4.4 R exercise Week 1
R Notebook: Create a new R Notebook for each weekly exercise. Watch the course video “Week 0: Intro to R Notebooks” as needed. At the end, “knit” it to an html file and view it in your browser.
Good news: if you can knit the file, the code can stand by itself (it does not depend on what dyou did in your R session before) and runs without errors. This is a good check. If there are error messages, check the ‘R Markdown’ tab for the code line number and try to fix it.
Task: Import the data set pulsatilla_genotypes
, which we’ll use in a later lab (Week 14), into gstudio
and convert it to a genind
object.
Data: This file contains microsatellite data for adults and seeds of the herb Pulsatilla vulgaris sampled at seven sites. Reference: DiLeo et al. (2018), Journal of Ecology 106:2242-2255. https://besjournals.onlinelibrary.wiley.com/doi/10.1111/1365-2745.12992
The following code copies the file into the ‘downloads’ folder in your R project folder:
file.copy(system.file("extdata", "pulsatilla_genotypes.csv", package = "LandGenCourse"),
paste0(here(), "/downloads/pulsatilla_genotypes.csv"), overwrite=FALSE)
## [1] FALSE
Variables:
- ID: Family ID (i.e., mother and her offspring have the same ID)
- OffID: ‘0’ for adults, seeds from the same mom are numbered 1, 2, etc.
- Population: site ID
- Coordinates: X and Y coordinates (Projection info: EPSG Projection 31468)
- Loci: seven diploid microsatellites, each with two columns (1 allele per column)
Hints:
- Load packages: Make sure the following packages are loaded:
gstudio
,here
,tibble
andadegenet
. - View data file: Adapt the code from section 2.c to import the raw data set. The file has a header row. View it. How are the genetic data coded?
- Import data into gstudio: Adapt the code from section 5.b to import the genetic data with ‘gstudio’. The loci are in columns 6 - 19. What setting for
type
is appropriate for this data set? Check the help file forread_population
. - Check imported data: Use
str
oras_tibble
to check the imported data. Does each variable have the correct data type? Note: there should be 7 variables of typelocus
. - Check variable types: Create a bulleted list, like the one above, with the variables. For each variable, list their R data type (e.g., numeric, integer, character, logical, factor, locus). Check the cheatsheet “R markdown language” as needed.
- Convert to genind object: Modify the code from section 5.e to convert the data from
gstudio
to agenind
object. Ignore the warning about duplicate labels. Print a summary of thegenind
object.
Question: What is the range of the number of genotyped individuals per population in this dataset?