Overview

The job is to map the position of trees within a quadrat. Later, the task will be to iterate over each quadrat in a data set. The example data set is a subset of a public data set of trees censuses on Barro Colorado Island, Panama (source: https://repository.si.edu/handle/10088/20925).

Setup

I’l use dplyr and ggplot2 for data wrangling and visualization. Also, I’ll use ggrepel to avoid overlaping on the maps; try which contains an example dataset; and assertive, to assert some important aspects of the data.

# Ensure random operations are reproducible
set.seed(1234)

library(try)
library(assertive)
library(ggrepel)
library(ggplot2)
library(dplyr)

Script

This section develops a script that prototypes one way to do this job. The code style I use is based on The tidyverse style guide; I aim not to minmimize lines of code but to maximize readablilty. I need to re read this code later and understand it quickly. I start by glimpsing the example data set.

# Example data set
# example_data <- sinharaja::sinh_vft
example_data <- try::bci12vft_mini


glimpse(example_data)
#> Observations: 4,374
#> Variables: 28
#> $ MeasureID        <int> 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1...
#> $ PlotID           <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
#> $ Plot             <chr> "bci", "bci", "bci", "bci", "bci", "bci", "bc...
#> $ Family           <chr> "Lecythidaceae", "Myristicaceae", "Malvaceae"...
#> $ GenusSpecies     <chr> "Gustavia superba", "Virola surinamensis", "Q...
#> $ Genus            <chr> "Gustavia", "Virola", "Quararibea", "Protium"...
#> $ SpeciesName      <chr> "superba", "surinamensis", "asterolepis", "te...
#> $ SubSpeciesName   <chr> "NULL", "NULL", "NULL", "NULL", "NULL", "NULL...
#> $ SpeciesID        <int> 460, 1075, 871, 828, 108, 828, 871, 871, 1144...
#> $ Mnemonic         <chr> "gustsu", "virosu", "quaras", "protte", "bros...
#> $ QuadratID        <int> 1250, 1250, 1250, 1249, 1249, 1248, 1248, 124...
#> $ QuadratName      <chr> "4924", "4924", "4924", "4923", "4923", "4922...
#> $ x                <dbl> 14.1, 10.5, 13.5, 12.7, 1.9, 3.7, 3.0, 3.0, 1...
#> $ y                <dbl> 8.3, 8.9, 18.3, 9.3, 13.5, 16.5, 12.6, 12.6, ...
#> $ gx               <dbl> 994, 990, 994, 993, 982, 984, 983, 983, 996, ...
#> $ gy               <dbl> 488, 489, 498, 469, 474, 456, 453, 453, 447, ...
#> $ TreeID           <int> 19, 21, 23, 24, 25, 29, 30, 30, 32, 33, 34, 3...
#> $ Tag              <chr> "000002", "000004", "000006", "000007", "0000...
#> $ StemID           <int> 1, 1, 7, 1, 1, 6, 2, 3, 2, 1, 1, 1, 6, 6, 1, ...
#> $ StemTag          <chr> NA, NA, "NULL", NA, NA, "NULL", "NULL", "NULL...
#> $ PrimaryStem      <chr> "main", "main", "main", "main", "main", "main...
#> $ CensusID         <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
#> $ PlotCensusNumber <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
#> $ DBH              <int> 304, 357, 313, 456, 1290, NA, 492, 33, NA, 42...
#> $ HOM              <dbl> 3.00, 1.30, 3.00, 1.30, 5.20, NA, 3.20, 1.30,...
#> $ ExactDate        <date> 2005-10-14, 2005-10-11, 2005-10-14, 2005-10-...
#> $ ListOfTSM        <chr> "B,cylY", "NULL", "B,cylY", "NULL", "B,cylN",...
#> $ Status           <chr> "alive", "alive", "alive", "alive", "alive", ...

Check input

Check single plot; reject if not multiple plots are detected

All quadrats should come from a single research plot. I’ll check if this true.

censuses_n <- unique(example_data$PlotID)
is_of_length(censuses_n, 1)
#> [1] TRUE

Should the data set contain multiple plots, I would reject the data and ask the user to filter a single plot. The user should then do something like this:

this_plotid <- 1
filtered_plotid <- filter(example_data, PlotID == this_plotid)

Check tree diameter; warn if trees smaller than a common limit are detected

A detail of the taks is that users will likely want to map trees which diameter (variable DBH) is greater than 10 cm. But that particular value of diameter may vary. The code will be inclusive; it’ll accept trees of all diameters and will only warn if it detects trees smaller or equal than 10 cm.

# Expecting to reuse this code
check_dbh <- function(x, limit) {
  if (any(x <= limit)) {
    warning("The diameter of some trees is smaller or equal to ", limit, " cm")
  } else{
    invisible()
  }
}
check_dbh(filtered_plotid$DBH, 10)
#> Warning in check_dbh(filtered_plotid$DBH, 10): The diameter of some trees
#> is smaller or equal to 10 cm

Excluding trees smaller than the desired limit will be up to the user, either via a “prepare_data()” function, or directly with something like this:

limit <- 10
over_limit <- filter(filtered_plotid, DBH > limit)
# Silent means that filtering was successful
check_dbh(over_limit$DBH, limit)

Check for multiple censuses; warn if multiple censuses are detected.

I believe the map should inform only the latest Status. That is, if multiple censuses where done, only the last one will be used to inform that the Status of each stem was.

if (length(unique(over_limit$CensusID)) > 1) {
  warning(
    "Multiple censuses were detected\n",
    "  * Filtering only the greatest `CensusID`"
  )
  last_census <- max(unique(over_limit$CensusID))
  is_last_census <- over_limit$CensusID == last_census
  over_limit <- over_limit[is_last_census, ]
}
#> Warning: Multiple censuses were detected
#>   * Filtering only the greatest `CensusID`

Map

Requirements

For now I’ll focus on a single quadrat. (I’ll work later on the code to iterate over multiple quadrats.)

# Choosing one quadrat at random
this_quadrat <- "4919"
one_quadrat <- filter(over_limit, QuadratName == this_quadrat)

The requirements are quoted below; they come from a private email (via @)

    1. Status: I’ll map the Status of each stem to a the ending of each label. This is a better alternative than mapping to, say point shape, because each tree Tag has multiple stems associated to them, and they will reppel each other. If the Status was mapped to shape, multiple points will plot on top of each other and the overlapping will obscure the information.

It would be nice to include the status. I used to put a “D” (dead) or an “R”(stem lost or broken) after the tag number to differentiate them from the live trees. For example, 8334D or 8145R.

    1. I won’t label subquadrats inside eadh quadrat.

I don’t need the subquadrat labels, since they are the same for every quadrat.

    1. I’ll fix the quadrat size to 20x20m. I’ll keep in mind that this may need to be flexible in the future.
  1. [In] most sites, the quadrats are 20x20 meters.
    1. I’ll map the shape of the point to an empty circle, and will map its size to the size of each tree, standarized by the number of trees the quadrat.

It would be nice to have a larger circle for the larger trees so that the field workers can look for a large tree. However, for very crowded quadrats, we can ignore this or make the circles smaller than if there are less trees. In my script, I count the number of trees in that quadrat and calculate the size depending on the number.

    1. Shape and position:
    • I’ll map live trees to a circular shape.
    • I’ll map the position of each tree to a point, using the variables x and y.
    • Also, I’ll map the position to the Tag name of each tree – that way field workers will know what tree they expect to find where. To avoid overlap, I’ll implement repulsive labels with ggrepel::geom_text_repel().
  1. For the live trees, yes, please use circles (tree trunks tend to be circular). But you are welcome to use other symbols for the “other” categories.
    1. I’ll remove all text from the axes;

The reason I don’t have the tick legends is to give more space to the map on the page.

    1. I’ll set the direction of the axes ticks to inward;

Same reason as above, to make the map as large as possible on the page. However, I wouldn’t mind if they are outwards.

    1. I’ll set the count of axes ticks to one meter along the range of x and y values;

Yes, so that the field workers can orient themselves on the maps and more carefully draw the location of the trees.

    1. I’ll set one grid line every 5 meters along the range of x and y values;

Yes, each grid represents a 5x5 m subquadrat. This is how the plots are physically gridded in the field.

    1. I’ll set he type of the grid lines to dashed.

Dashed seems to me to be better than solid.

    1. I’ll add a one-line title; I’ll to this at the top of the code to help visualize where on the plot each code chunk is doing its job.

… would be great if I can manually choose the title.

The plotname in ViewFullTable carries the name of the plot.

    1. I’ll add a multi-line subtitile, wich I’ll do in advance to avoid cluttering the code of the plot. Also, placing this code under the title will help to visualize where on the plot this code chunk is doing its job.

I think each site would like to be able to type in their subheaders.

  • I’ll set the background colour to white;

(Eventually I’ll let the user adjust these options – but I’ll do that later; now I focus on getting the maps close to the most common requirement.)

Helpers

theme <- theme(
  panel.background = element_rect(fill = "white"),
  panel.grid.minor = element_blank(),
  panel.grid.major = element_blank(),
  legend.position = "none",
  plot.title = element_text(size = 20),
  plot.subtitle = element_text(size = 12),
  axis.text = element_blank(),
  panel.border = element_rect(colour = "black", fill = NA),
  axis.ticks.length = unit(-0.1, "cm")
)

# Add the ending ".d" to the `Tag` of dead stems.
tag_dead <- function(x, y) {
  is_dead <- y == "dead"
  x[is_dead] <- paste0(x[is_dead], ".", sub("^(.).*$", "\\1", y[is_dead]))
  x
}

# See if this works
tag_dead(x = c("tag1", "tag2"), y = c("dead", "alive"))
#> [1] "tag1.d" "tag2"

Set paramenters

# Parameters that will likely become arguments of a function
lim_min <- 0
lim_max <- 20
subquadrat_side <- 5
size_label <- 2
# Adjust a margin around the plot area that ggplot2 inserts by default
offset <- 1
title <- paste("My Title", unique(one_quadrat$QuadratName), sep = ", ")
header <- paste0(
  "Nombres y fecha:           Revisado por:                Entrado en PC por:\n",
  "                                                                          \n",
  "__________________  __________________  __________________\n",
  "__________________  __________________  __________________\n",
  "__________________  __________________  __________________\n"
)

Prepare

one_quadrat <- mutate(
  one_quadrat, 
  # Tad dead trees
  tagged_tag = tag_dead(Tag, Status),
  # Standarize the size of the points by the total number of trees
  dbh_standarized = DBH / length(DBH)
)

dead <- filter(one_quadrat, Status == "dead")

Plot

ggplot(data = one_quadrat, aes(x = x, y = y)) +
  geom_point(aes(size = dbh_standarized), shape = 1) +
  geom_text_repel(aes(label = tagged_tag), size = size_label) +
  # /* Repeat for the dead trees, which are lost because dbh is NA
  geom_text_repel(data = dead, aes(label = tagged_tag), size = size_label) +
  # */
  labs(title = title, subtitle = header, x = NULL, y = NULL) +
  geom_vline(
    xintercept = seq(lim_min, lim_max, by = subquadrat_side), 
    linetype = "dashed"
  ) +
  geom_hline(
    yintercept = seq(lim_min, lim_max, by = subquadrat_side), 
    linetype = "dashed"
  ) +
  coord_fixed(
    xlim = c(lim_min + offset, lim_max - offset), 
    ylim = c(lim_min + offset, lim_max - offset)
  ) +
  scale_x_continuous(breaks = lim_min:lim_max, sec.axis = dup_axis()) +
  scale_y_continuous(breaks = lim_min:lim_max, sec.axis = dup_axis()) +
  theme

Observe potential problem

I’m suprised that all trees are alive, and I wonder if this is an error.

unique(one_quadrat$Status)
#> [1] "alive"

I’ll check if the plot works if I introduce dead trees.

with_dead <- one_quadrat
with_dead$Status[1:30] <- "dead"
with_dead$DBH[1:30] <- NA

Next I’ll repeate the exact same preparation and plotting code.

Prepare

with_dead <- mutate(
  with_dead, 
  # Tad dead trees
  tagged_tag = tag_dead(Tag, Status),
  # Standarize the size of the points by the total number of trees
  dbh_standarized = DBH / length(DBH)
)

dead <- filter(with_dead, Status == "dead")

Plot

ggplot(data = with_dead, aes(x = x, y = y)) +
  geom_point(aes(size = dbh_standarized), shape = 1) +
  geom_text_repel(aes(label = tagged_tag), size = size_label) +
  # /* Repeat for the dead trees, which are lost because dbh is NA
  geom_text_repel(data = dead, aes(label = tagged_tag), size = size_label) +
  # */
  labs(title = title, subtitle = header, x = NULL, y = NULL) +
  geom_vline(
    xintercept = seq(lim_min, lim_max, by = subquadrat_side), 
    linetype = "dashed"
  ) +
  geom_hline(
    yintercept = seq(lim_min, lim_max, by = subquadrat_side), 
    linetype = "dashed"
  ) +
  coord_fixed(
    xlim = c(lim_min + offset, lim_max - offset), 
    ylim = c(lim_min + offset, lim_max - offset)
  ) +
  scale_x_continuous(breaks = lim_min:lim_max, sec.axis = dup_axis()) +
  scale_y_continuous(breaks = lim_min:lim_max, sec.axis = dup_axis()) +
  theme
#> Warning: Removed 30 rows containing missing values (geom_point).

Explain why dead trees are plotted independently

This is why the tags are plotted twice – once for the entire dataset and once of dead treed only. Deead trees have DBH = NA, which I get removed automatically by ggplot2 when DBH is passed to geom_point(). Below

select(with_dead, DBH, Status, Tag, tagged_tag)
ggplot(data = with_dead, aes(x = x, y = y)) +
  geom_point(aes(size = dbh_standarized), shape = 1) +
  geom_text_repel(aes(label = tagged_tag), size = size_label) +
  # /* Repeat for the dead trees, which are lost because dbh is NA
  # geom_text_repel(data = dead, aes(label = tagged_tag), size = size_label) +
  # */
  labs(title = title, subtitle = header, x = NULL, y = NULL) +
  geom_vline(
    xintercept = seq(lim_min, lim_max, by = subquadrat_side), 
    linetype = "dashed"
  ) +
  geom_hline(
    yintercept = seq(lim_min, lim_max, by = subquadrat_side), 
    linetype = "dashed"
  ) +
  coord_fixed(
    xlim = c(lim_min + offset, lim_max - offset), 
    ylim = c(lim_min + offset, lim_max - offset)
  ) +
  scale_x_continuous(breaks = lim_min:lim_max, sec.axis = dup_axis()) +
  scale_y_continuous(breaks = lim_min:lim_max, sec.axis = dup_axis()) +
  theme
#> Warning: Removed 30 rows containing missing values (geom_point).

My conclusion about dead trees not appearing on the example data

The last few sectios confirmed that the plot is able to map dead trees. The reason why the example data isn’t showing dead trees is indeed because no alive tree is left after the filtering process I went through here. The issue remains to find out the filtering I’m doing is resonable or if I am inadvertedly removing dead trees. One way to check is to use real data – not the minimal example data I used here – and confirm that the plot shows trees that in the last census had Status = "dead".