The job is to map the position of trees within a quadrat. Later, the task will be to iterate over each quadrat in a data set. The example data set is a subset of a public data set of trees censuses on Barro Colorado Island, Panama (source: https://repository.si.edu/handle/10088/20925).
I’l use dplyr and ggplot2 for data wrangling and visualization. Also, I’ll use ggrepel to avoid overlaping on the maps; try which contains an example dataset; and assertive, to assert some important aspects of the data.
# Ensure random operations are reproducible
set.seed(1234)
library(try)
library(assertive)
library(ggrepel)
library(ggplot2)
library(dplyr)
This section develops a script that prototypes one way to do this job. The code style I use is based on The tidyverse style guide; I aim not to minmimize lines of code but to maximize readablilty. I need to re read this code later and understand it quickly. I start by glimpsing the example data set.
# Example data set
# example_data <- sinharaja::sinh_vft
example_data <- try::bci12vft_mini
glimpse(example_data)
#> Observations: 4,374
#> Variables: 28
#> $ MeasureID <int> 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1...
#> $ PlotID <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
#> $ Plot <chr> "bci", "bci", "bci", "bci", "bci", "bci", "bc...
#> $ Family <chr> "Lecythidaceae", "Myristicaceae", "Malvaceae"...
#> $ GenusSpecies <chr> "Gustavia superba", "Virola surinamensis", "Q...
#> $ Genus <chr> "Gustavia", "Virola", "Quararibea", "Protium"...
#> $ SpeciesName <chr> "superba", "surinamensis", "asterolepis", "te...
#> $ SubSpeciesName <chr> "NULL", "NULL", "NULL", "NULL", "NULL", "NULL...
#> $ SpeciesID <int> 460, 1075, 871, 828, 108, 828, 871, 871, 1144...
#> $ Mnemonic <chr> "gustsu", "virosu", "quaras", "protte", "bros...
#> $ QuadratID <int> 1250, 1250, 1250, 1249, 1249, 1248, 1248, 124...
#> $ QuadratName <chr> "4924", "4924", "4924", "4923", "4923", "4922...
#> $ x <dbl> 14.1, 10.5, 13.5, 12.7, 1.9, 3.7, 3.0, 3.0, 1...
#> $ y <dbl> 8.3, 8.9, 18.3, 9.3, 13.5, 16.5, 12.6, 12.6, ...
#> $ gx <dbl> 994, 990, 994, 993, 982, 984, 983, 983, 996, ...
#> $ gy <dbl> 488, 489, 498, 469, 474, 456, 453, 453, 447, ...
#> $ TreeID <int> 19, 21, 23, 24, 25, 29, 30, 30, 32, 33, 34, 3...
#> $ Tag <chr> "000002", "000004", "000006", "000007", "0000...
#> $ StemID <int> 1, 1, 7, 1, 1, 6, 2, 3, 2, 1, 1, 1, 6, 6, 1, ...
#> $ StemTag <chr> NA, NA, "NULL", NA, NA, "NULL", "NULL", "NULL...
#> $ PrimaryStem <chr> "main", "main", "main", "main", "main", "main...
#> $ CensusID <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
#> $ PlotCensusNumber <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
#> $ DBH <int> 304, 357, 313, 456, 1290, NA, 492, 33, NA, 42...
#> $ HOM <dbl> 3.00, 1.30, 3.00, 1.30, 5.20, NA, 3.20, 1.30,...
#> $ ExactDate <date> 2005-10-14, 2005-10-11, 2005-10-14, 2005-10-...
#> $ ListOfTSM <chr> "B,cylY", "NULL", "B,cylY", "NULL", "B,cylN",...
#> $ Status <chr> "alive", "alive", "alive", "alive", "alive", ...
All quadrats should come from a single research plot. I’ll check if this true.
censuses_n <- unique(example_data$PlotID)
is_of_length(censuses_n, 1)
#> [1] TRUE
Should the data set contain multiple plots, I would reject the data and ask the user to filter a single plot. The user should then do something like this:
this_plotid <- 1
filtered_plotid <- filter(example_data, PlotID == this_plotid)
A detail of the taks is that users will likely want to map trees which diameter (variable DBH
) is greater than 10 cm. But that particular value of diameter may vary. The code will be inclusive; it’ll accept trees of all diameters and will only warn if it detects trees smaller or equal than 10 cm.
# Expecting to reuse this code
check_dbh <- function(x, limit) {
if (any(x <= limit)) {
warning("The diameter of some trees is smaller or equal to ", limit, " cm")
} else{
invisible()
}
}
check_dbh(filtered_plotid$DBH, 10)
#> Warning in check_dbh(filtered_plotid$DBH, 10): The diameter of some trees
#> is smaller or equal to 10 cm
Excluding trees smaller than the desired limit will be up to the user, either via a “prepare_data()
” function, or directly with something like this:
limit <- 10
over_limit <- filter(filtered_plotid, DBH > limit)
# Silent means that filtering was successful
check_dbh(over_limit$DBH, limit)
I believe the map should inform only the latest Status
. That is, if multiple censuses where done, only the last one will be used to inform that the Status
of each stem was.
if (length(unique(over_limit$CensusID)) > 1) {
warning(
"Multiple censuses were detected\n",
" * Filtering only the greatest `CensusID`"
)
last_census <- max(unique(over_limit$CensusID))
is_last_census <- over_limit$CensusID == last_census
over_limit <- over_limit[is_last_census, ]
}
#> Warning: Multiple censuses were detected
#> * Filtering only the greatest `CensusID`
For now I’ll focus on a single quadrat. (I’ll work later on the code to iterate over multiple quadrats.)
# Choosing one quadrat at random
this_quadrat <- "4919"
one_quadrat <- filter(over_limit, QuadratName == this_quadrat)
The requirements are quoted below; they come from a private email (via @)
Status
of each stem to a the ending of each label. This is a better alternative than mapping to, say point shape, because each tree Tag
has multiple stems associated to them, and they will reppel each other. If the Status
was mapped to shape, multiple points will plot on top of each other and the overlapping will obscure the information.It would be nice to include the status. I used to put a “D” (dead) or an “R”(stem lost or broken) after the tag number to differentiate them from the live trees. For example, 8334D or 8145R.
I don’t need the subquadrat labels, since they are the same for every quadrat.
- [In] most sites, the quadrats are 20x20 meters.
It would be nice to have a larger circle for the larger trees so that the field workers can look for a large tree. However, for very crowded quadrats, we can ignore this or make the circles smaller than if there are less trees. In my script, I count the number of trees in that quadrat and calculate the size depending on the number.
x
and y
.Tag
name of each tree – that way field workers will know what tree they expect to find where. To avoid overlap, I’ll implement repulsive labels with ggrepel::geom_text_repel()
.
- For the live trees, yes, please use circles (tree trunks tend to be circular). But you are welcome to use other symbols for the “other” categories.
The reason I don’t have the tick legends is to give more space to the map on the page.
Same reason as above, to make the map as large as possible on the page. However, I wouldn’t mind if they are outwards.
x
and y
values;Yes, so that the field workers can orient themselves on the maps and more carefully draw the location of the trees.
x
and y
values;Yes, each grid represents a 5x5 m subquadrat. This is how the plots are physically gridded in the field.
Dashed seems to me to be better than solid.
… would be great if I can manually choose the title.
The plotname in ViewFullTable carries the name of the plot.
I think each site would like to be able to type in their subheaders.
(Eventually I’ll let the user adjust these options – but I’ll do that later; now I focus on getting the maps close to the most common requirement.)
theme <- theme(
panel.background = element_rect(fill = "white"),
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
legend.position = "none",
plot.title = element_text(size = 20),
plot.subtitle = element_text(size = 12),
axis.text = element_blank(),
panel.border = element_rect(colour = "black", fill = NA),
axis.ticks.length = unit(-0.1, "cm")
)
# Add the ending ".d" to the `Tag` of dead stems.
tag_dead <- function(x, y) {
is_dead <- y == "dead"
x[is_dead] <- paste0(x[is_dead], ".", sub("^(.).*$", "\\1", y[is_dead]))
x
}
# See if this works
tag_dead(x = c("tag1", "tag2"), y = c("dead", "alive"))
#> [1] "tag1.d" "tag2"
# Parameters that will likely become arguments of a function
lim_min <- 0
lim_max <- 20
subquadrat_side <- 5
size_label <- 2
# Adjust a margin around the plot area that ggplot2 inserts by default
offset <- 1
title <- paste("My Title", unique(one_quadrat$QuadratName), sep = ", ")
header <- paste0(
"Nombres y fecha: Revisado por: Entrado en PC por:\n",
" \n",
"__________________ __________________ __________________\n",
"__________________ __________________ __________________\n",
"__________________ __________________ __________________\n"
)
one_quadrat <- mutate(
one_quadrat,
# Tad dead trees
tagged_tag = tag_dead(Tag, Status),
# Standarize the size of the points by the total number of trees
dbh_standarized = DBH / length(DBH)
)
dead <- filter(one_quadrat, Status == "dead")
ggplot(data = one_quadrat, aes(x = x, y = y)) +
geom_point(aes(size = dbh_standarized), shape = 1) +
geom_text_repel(aes(label = tagged_tag), size = size_label) +
# /* Repeat for the dead trees, which are lost because dbh is NA
geom_text_repel(data = dead, aes(label = tagged_tag), size = size_label) +
# */
labs(title = title, subtitle = header, x = NULL, y = NULL) +
geom_vline(
xintercept = seq(lim_min, lim_max, by = subquadrat_side),
linetype = "dashed"
) +
geom_hline(
yintercept = seq(lim_min, lim_max, by = subquadrat_side),
linetype = "dashed"
) +
coord_fixed(
xlim = c(lim_min + offset, lim_max - offset),
ylim = c(lim_min + offset, lim_max - offset)
) +
scale_x_continuous(breaks = lim_min:lim_max, sec.axis = dup_axis()) +
scale_y_continuous(breaks = lim_min:lim_max, sec.axis = dup_axis()) +
theme
I’m suprised that all trees are alive, and I wonder if this is an error.
unique(one_quadrat$Status)
#> [1] "alive"
I’ll check if the plot works if I introduce dead trees.
with_dead <- one_quadrat
with_dead$Status[1:30] <- "dead"
with_dead$DBH[1:30] <- NA
Next I’ll repeate the exact same preparation and plotting code.
with_dead <- mutate(
with_dead,
# Tad dead trees
tagged_tag = tag_dead(Tag, Status),
# Standarize the size of the points by the total number of trees
dbh_standarized = DBH / length(DBH)
)
dead <- filter(with_dead, Status == "dead")
ggplot(data = with_dead, aes(x = x, y = y)) +
geom_point(aes(size = dbh_standarized), shape = 1) +
geom_text_repel(aes(label = tagged_tag), size = size_label) +
# /* Repeat for the dead trees, which are lost because dbh is NA
geom_text_repel(data = dead, aes(label = tagged_tag), size = size_label) +
# */
labs(title = title, subtitle = header, x = NULL, y = NULL) +
geom_vline(
xintercept = seq(lim_min, lim_max, by = subquadrat_side),
linetype = "dashed"
) +
geom_hline(
yintercept = seq(lim_min, lim_max, by = subquadrat_side),
linetype = "dashed"
) +
coord_fixed(
xlim = c(lim_min + offset, lim_max - offset),
ylim = c(lim_min + offset, lim_max - offset)
) +
scale_x_continuous(breaks = lim_min:lim_max, sec.axis = dup_axis()) +
scale_y_continuous(breaks = lim_min:lim_max, sec.axis = dup_axis()) +
theme
#> Warning: Removed 30 rows containing missing values (geom_point).
This is why the tags are plotted twice – once for the entire dataset and once of dead treed only. Deead trees have DBH = NA
, which I get removed automatically by ggplot2 when DBH
is passed to geom_point()
. Below
select(with_dead, DBH, Status, Tag, tagged_tag)
ggplot(data = with_dead, aes(x = x, y = y)) +
geom_point(aes(size = dbh_standarized), shape = 1) +
geom_text_repel(aes(label = tagged_tag), size = size_label) +
# /* Repeat for the dead trees, which are lost because dbh is NA
# geom_text_repel(data = dead, aes(label = tagged_tag), size = size_label) +
# */
labs(title = title, subtitle = header, x = NULL, y = NULL) +
geom_vline(
xintercept = seq(lim_min, lim_max, by = subquadrat_side),
linetype = "dashed"
) +
geom_hline(
yintercept = seq(lim_min, lim_max, by = subquadrat_side),
linetype = "dashed"
) +
coord_fixed(
xlim = c(lim_min + offset, lim_max - offset),
ylim = c(lim_min + offset, lim_max - offset)
) +
scale_x_continuous(breaks = lim_min:lim_max, sec.axis = dup_axis()) +
scale_y_continuous(breaks = lim_min:lim_max, sec.axis = dup_axis()) +
theme
#> Warning: Removed 30 rows containing missing values (geom_point).
The last few sectios confirmed that the plot is able to map dead trees. The reason why the example data isn’t showing dead trees is indeed because no alive tree is left after the filtering process I went through here. The issue remains to find out the filtering I’m doing is resonable or if I am inadvertedly removing dead trees. One way to check is to use real data – not the minimal example data I used here – and confirm that the plot shows trees that in the last census had Status = "dead"
.