16.1 Worked Example

Melanie A. Murphy & Jeffrey S. Evans

1. Overview of Worked Example

a. Background

There are many ways graphs can be implemented to understand population structure and relate that structure to landscape characteristics (see Dyer and Nason 2004). In this exercise, we will calculate various graph metrics and apply graphs to fit a gravity model.

Gravity models are a type of inferential model that exploit graph characteristics. Gravity models include both at-site and between-site landscape data. They are a type of graph consisting of nodes and edges. These nodes and edges represent landscape characteristics associated with these graph elements.

b. Data set

In this exercise, you will use the gravity model framework to build an empirical model of gene flow for the Columbia spotted frog dataset in central Idaho that you have used for several other exercises (Murphy et al. 2010).

2. Setup

Add required packages

This code checks of all required packages are installed, and if so, loads them. If any are missing, it will return a message that identifies the packages that need to be installed. If that happens, install the packages and run the code again.

The package then checks if your version of the GeNetIt package is up-to-date (at least 0.1-5), and if not, installs the latest version.

p <- c("raster", "igraph", "sp", "GeNetIt", "spatialEco", "leaflet",
       "sf", "terra", "sfnetworks", "spdep", "dplyr", "tmap", "devtools") 
  if(any(!unlist(lapply(p, requireNamespace, quietly=TRUE)))) { 
    m = which(!unlist(lapply(p, requireNamespace, quietly=TRUE)))
      suppressMessages(invisible(lapply(p[-m], require,    
                       character.only=TRUE)))
    stop("Missing library, please install ", paste(p[m], collapse = " "))
  } else {
    if(packageVersion("GeNetIt") < "0.1-5") {
      remotes::install_github("jeffreyevans/GeNetIt")
    } 
    suppressMessages(invisible(lapply(p, require, character.only=TRUE)))
  }

Prepare work environment

# Get the path to your project folder
Path <- here::here("downloads")

## i.dicates UTM 11 NAD83 projection
prj = 32611 

# Some needed functions
back.transform <- function(y) exp(y + 0.5 * stats::var(y))
rmse = function(p, o){ sqrt(mean((p - o)^2)) }

3. Wetland complex data preparation

In this sections, we will read in all wetland locations in the study area and calculate a few graph-based metrics to assign to wetland sites that data was collected at. This allows us to put our samples into the context of the larger wetland system thus, accounting for proximity and juxtaposition.

a. Read in wetlands data

Import file “Wetlands.csv”.

wetlands <- read.csv(system.file("extdata/Wetlands.csv", package="LandGenCourse"), 
                    header = TRUE)
head(wetlands)

##   ID      X       Y RALU          SiteName
## 1  1 688835 5002939    y      AirplaneLake
## 2  2 687460 4994400    n  AlpineInletCreek
## 3  3 687507 4994314    n AlpineInletMeadow
## 4  4 687637 4994117    n        AlpineLake
## 5  5 688850 4997750    n   AxeHandleMeadow
## 6  6 688500 4998900    y    BachelorMeadow

Make it a spatial object.

wetlands <- st_as_sf(wetlands, coords = c("X", "Y"), 
                     crs = 32611, agr = "constant") 
str(wetlands)

## Classes 'sf' and 'data.frame':   121 obs. of  4 variables:
##  $ ID      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ RALU    : chr  "y" "n" "n" "n" ...
##  $ SiteName: chr  "AirplaneLake" "AlpineInletCreek" "AlpineInletMeadow" "AlpineLake" ...
##  $ geometry:sfc_POINT of length 121; first list element:  'XY' num  688835 5002939
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: 1 1 1
##   ..- attr(*, "names")= chr [1:3] "ID" "RALU" "SiteName"

b. Create wetlands graph

Create Gabriel graph from the wetlands to represent a “realization” of connectivity and spatial arrangement.

Derive Gabriel graph

gg <- graph2nb(gabrielneigh(st_coordinates(wetlands)),sym=TRUE)

plot(gg, coords=st_coordinates(wetlands))

Questions:

This graph may or may not be the best representation of wetland connectivity. What other types of graphs could you build?
What wetlands are connected to each other based on the graph?

c. Graph metrics

To calculate graph metrics, we need to do a few steps:

Coerce to sf line object (will be used to create igraph object)

gg <- nb2lines(gg, coords = sf::st_coordinates(wetlands), 
                 proj4string = prj, as_sf=TRUE)

Coerce to a sfnetwork, which is an igraph object:

wg <- as_sfnetwork(gg, edges=gg, nodes=wetlands, directed = FALSE,
                  node_key = "SiteName", length_as_weight = TRUE, 
                          edges_as_lines = TRUE)

Calculate weights

w <- wg %>% activate("edges") %>% pull(weight) %>% as.numeric()
w[w <= 0] <- 1 
w = w / sum(w)

Calculate graph metrics of betweenness and closeness with weights and degree. We’ll add these as attributes (columns) to the wetlands sf object.

degree - the number of connections a node has
betweenness - the number of shortest paths going through a node
closensess - the average of the shortest path length from the node to every other node in the network

wetlands$betweenness <- igraph::betweenness(wg, directed=FALSE, weights=w)
wetlands$degree <- igraph::degree(wg)
wetlands$closeness <- igraph::closeness(wg, weights=w)
wetlands

## Simple feature collection with 121 features and 6 fields
## Attribute-geometry relationships: constant (3), NA's (3)
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 686244 ymin: 4993077 xmax: 695699 ymax: 5004317
## Projected CRS: WGS 84 / UTM zone 11N
## First 10 features:
##    ID RALU          SiteName               geometry betweenness degree
## 1   1    y      AirplaneLake POINT (688835 5002939)         288      8
## 2   2    n  AlpineInletCreek POINT (687460 4994400)         443      6
## 3   3    n AlpineInletMeadow POINT (687507 4994314)         332      4
## 4   4    n        AlpineLake POINT (687637 4994117)         220      6
## 5   5    n   AxeHandleMeadow POINT (688850 4997750)        1690      6
## 6   6    y    BachelorMeadow POINT (688500 4998900)         195      4
## 7   7    y    BarkingFoxLake POINT (687944 5000006)        1323      6
## 8   8    n BarkingFoxWetland POINT (687872 5000041)        1291     10
## 9   9    n          BartPond POINT (687150 4995850)           0      6
## 10 10    n      BigClearLake POINT (690888 5004126)           0      4
##    closeness
## 1  0.3538381
## 2  0.3407043
## 3  0.3352415
## 4  0.3226762
## 5  0.5426490
## 6  0.5334932
## 7  0.5094841
## 8  0.5066511
## 9  0.3783181
## 10 0.2706522

d. Plot graph metric

Plot results using the graph edges and wetlands points with the attributes “betweenness”, “closeness” and “degree”.

Plot betweenness

plot(st_geometry(gg), col="grey")
  plot(wetlands["betweenness"], pch=19,  
       cex=0.75, add=TRUE)
     box()
     title("Wetlands Gabriel graph betweenness")

Plot closeness

plot(st_geometry(gg), col="grey")
plot(wetlands["closeness"], pch=19,  
     cex=0.75, add=TRUE)
box()
title("Wetlands Gabriel graph closeness")

Plot degree

plot(st_geometry(gg), col="grey")
plot(wetlands["degree"], pch=19,  
     cex=0.75, add=TRUE)
box()
title("Wetlands Gabriel graph degree")

Questions: Consider the three figures above and how the graph was constructed.

In what way(s) is the resulting graph potentially ecologically meaningful?
How might it not be ecologically or biologically meaningful?

4. Wetland field-data preparation

In this section we will read the field data and add the node metrics we just calculated.

Using RALU_Site.csv, we read in the data, add the node data (betweenness and degree), create a spatial object that includes the node data.

Read in site data

sites <- read.csv(system.file("extdata/RALU_Site.csv", package="LandGenCourse"), 
                  header = TRUE)
sites$SiteID <- as.character(sites$SiteID)

Add the node data.

Note: using names is dangerous as, small changes in names can result in non-matches. In this case, the ID fields are not consistent (data were collected at different times for different purposes originally). However, names are standardized in a drop-down list of a database. So they are a matching field. My recommendation to you all is to do to this type of operation on a numeric field.

nodestats <- st_drop_geometry(wetlands[,c(3,5:7)])
  nodestats <- nodestats[which(nodestats$SiteName %in% sites$SiteName),] 
sites <- merge(nodestats, sites, by="SiteName")

Convert data frame to sf object.

sites <- st_as_sf(sites, coords = c("X", "Y"), 
                 crs = prj, agr = "constant") 
head(sites)

## Simple feature collection with 6 features and 14 fields
## Attribute-geometry relationships: constant (14)
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 687944 ymin: 4996458 xmax: 690127 ymax: 5002939
## Projected CRS: WGS 84 / UTM zone 11N
##         SiteName betweenness degree closeness SiteID     Elev Length    Area
## 1   AirplaneLake         288      8 0.3538381     27 2564.381    390 62582.2
## 2 BachelorMeadow         195      4 0.5334932     15 2591.781      0   225.0
## 3 BarkingFoxLake        1323      6 0.5094841     19 2545.275    160 12000.0
## 4        BobLake         116      4 0.4652124     16 2649.125    143  4600.0
## 5      CacheLake         673      4 0.5189996     10 2475.829     75  2268.8
## 6        DoeLake         758      6 0.4710215      8 2463.006    170 13034.9
##    Perim Depth  pH Dforest Drock Dshrub               geometry
## 1 1142.8 21.64 6.5   0.398 0.051   0.00 POINT (688835 5002939)
## 2   60.0  0.40 6.1   0.000 0.000   0.20 POINT (688500 4998900)
## 3  435.0  5.00 6.5   0.400 0.250   0.05 POINT (687944 5000006)
## 4  321.4  2.00 7.0   0.550 0.000   0.05 POINT (690127 4999150)
## 5  192.0  1.86 6.5   0.508 0.000   0.00 POINT (688777 4997264)
## 6  463.2  6.03 7.6   0.254 0.000   0.00 POINT (688968 4996458)

Question:

What are the fields here? What data are included in sites?

5. Saturated Graph

a. Create graph from site locations

To assess connectivity using a gravity model, we need to build a graph from the occupied frog sites create a graph. This could be any type of graph, but I generally use saturated (essentially a full genetic distance matrix) or pruned by some maximum distance.

dist.graph <- knn.graph(sites, row.names = sites$SiteName)
dist.graph <- merge(dist.graph, st_drop_geometry(sites), 
                      by.y="SiteName", by.x="from_ID")
dist.graph <- dist.graph[,-c(11:19)] ## d.op extra columns

Note: Can create a distance-constrained graph with max.dist arg (not run)

## d.st.graph <- knn.graph(sites, row.names = sites$SiteName, max.dist=5000)

b. Merge the graph with genetic distance

Read in the genetic distance data and make a matrix, gdist, then unfold data

gdist <- read.csv(system.file("extdata/RALU_Dps.csv", package="LandGenCourse"), header=TRUE) 
  rownames(gdist) <- t(names(gdist))
gdist <- dmatrix.df(as.matrix(gdist)) 
names(gdist) <- c("FROM", "TO", "GDIST") #unfold the file
gdist <- gdist[!gdist$FROM == gdist$TO ,]
gdist[,1] <-sub("X", "", gdist[,1])
gdist[,2] <-sub("X", "", gdist[,2])
gdist <- cbind(from.to=paste(gdist[,1], gdist[,2], sep="."), gdist)

Transform genetic distance to genetic flow (1-distance)

gdist$GDIST <- flow(gdist$GDIST)

Merge graph with genetic distances, based on from node - to node combination

dist.graph$from.to <- paste(dist.graph$i, dist.graph$j, sep=".")
dist.graph <- merge(dist.graph, gdist, by = "from.to")

Question:

What is in the resulting object?

6. Spatial model data prepration

a. Read raster data using ‘terra’

xvars <- terra::rast(system.file("extdata/covariates.tif", package="GeNetIt"))

b. Reclassify wetlands

NLCD is land cover data. The wetland classes are 11 and 90-95 (for this system and vintage NLCD data)

m <- c(0,10.8, 0,10.9,12.1,1,12.9,89.5,0, 89.1,95.1,1,95.9,100,0 )
  reclass <- matrix(m, ncol=3, byrow=TRUE)
  
wetlnd <- classify(xvars[["nlcd"]], reclass)
  names(wetlnd) <- "wetlnd"
    xvars <- c(xvars, wetlnd)

c. Calculate the proportion of the landscape around sites

Assign proportion of landcover that is wetland to sites as pwetland. You could create a binary raster for any cover type of interest and calculate this parameter.

Question:

You want to know if areas of dense wetlands produce more frogs. What buffer distance will you use?

Create function to extract the proportion of wetland.

## method 1 (can result in Inf if all zero)
#  prop.land <- function(x) {
#   length(x[x==1]) / length(x)  
#  }

## method 2 (no divide by zero error)
prop.land <- function(x) {
prop.table(table(factor(x, levels=c(0,1))))[2]
}

Apply the function to extract the proportion of wetland within a buffer (here: 300 m).

b <- st_buffer(sites, 300)
pwetland <- extract(wetlnd, vect(b))

## Warning: [extract] transforming vector data to the CRS of the raster

pwetland <- tapply(pwetland[,2], pwetland[,1], prop.land)

Add the proportion of wetland back to the dataframe.

sites$pwetland <- as.numeric(pwetland)
head(sites$pwetland)

## [1] 0.205047319 0.018987342 0.047923323 0.000000000 0.003174603 0.115755627

Challenge:

What happens if you change this radius?
What radius do you think makes the most sense ecologically?

Alternatively, you can use the landscapemetrics package (see Week 2) to calculate a broader set of landscape metrics. Here is an example for calculating pland.

Note: as per help file, the argument y of the function sample_lsm can accept an sf points object (e.g., sites). This returned an error though, hence we first convert sites to an sp object, using the function as_Spatial.

nlcd_sampled <- landscapemetrics::sample_lsm(landscape = xvars[["wetlnd"]], 
                                             what = "lsm_c_pland",
                                             shape = "circle",
                                             y = sf::st_coordinates(sites), 
                                             size = 300, 
                                             return_raster = FALSE,
                                             plot_id=sites$SiteID)
pwetland <- dplyr::select(dplyr::filter(nlcd_sampled, class == 1, 
                                        metric == "pland"), plot_id, value)  
names(pwetland) <- c("SiteID", "pwetland")
pwetland$pwetland <- pwetland$pwetland/100

head(pwetland)

Note: these values are sorted by SiteID and are not in the same order as the sf object sites, that’s why the first six values may not be the same.

d. Add values of rasters to sample sites

This adds potential “at site” variables, keep as sf POINT class object. We are removing raster 6 as knowing the cover class from NLCD that intersects the same points is not very useful information.

stats <- extract(xvars[[-6]], vect(sites))

## Warning: [extract] transforming vector data to the CRS of the raster

sites <- st_sf(data.frame(as.data.frame(sites), stats), 
                 geometry=sites$geometry)

e. Add raster covariates to graph edges (lines)

Remove nlcd and wetlnd rasters before calculating statistics, as calculating statistics on categorical data is nonsensical.

Calculating stats - this will take a while!

idx <- which(names(xvars) %in% c("nlcd","wetlnd"))
suppressWarnings(
  stats <- graph.statistics(dist.graph, r = xvars[[-idx]], 
                            buffer= NULL, stats = c("min",         
                            "mean","max", "var", "median")))

Add statistics to graph.

dist.graph <- st_sf(data.frame(as.data.frame(dist.graph), stats), 
                    geometry=dist.graph$geometry)

f. What about categorical variables?

Statistical moments (mean, variance, etc.) are nonsensical for categorical variables. Here we create a function for returning the percent wetland between sites. Then we use it to calculate an additional statistic and add the result to the graph.

Question:

Are there other categorical variables that you think may be ecologically important?

Function to calculate percent wetland between pairs of sites:

wet.pct <- function(x) {
  x <- ifelse(x == 11 | x == 12 | x == 90 | x == 95, 1, 0)
    prop.table(table(factor(x, levels=c(0,1))))[2] 
}

Calculate statistic and add it to graph.

suppressWarnings(
  wetstats <- graph.statistics(dist.graph, r=xvars[["nlcd"]], 
                               buffer= NULL, stats = c("wet.pct")) )
dist.graph$wet.pct.nlcd <- as.numeric(wetstats[,1])

g. Evaluate node and edge correlations

We need to evaluate correlations in the data to avoid overdispersion in our models (remember lab from Week 12, where we calculated the variance inflation factor, VIF).

Note, we are not going to actually remove the correlated variables but, just go through a few methods of evaluating them. The code to remove colinear variables is commented out for reference. We do have to log transform the data as to evaluate the actual model structure.

Create data frame with the node variables to be evaluated.

node.var <- c("degree", "betweenness", "Elev", "Length", "Area", "Perim", 
              "Depth", "pH","Dforest","Drock", "Dshrub", "pwetland", "cti",
              "dd5", "ffp","gsp","pratio","hli","rough27","srr")
s <- st_drop_geometry(sites) %>% select("degree", "betweenness", "Elev", "Length", "Area", "Perim", 
              "Depth", "pH","Dforest","Drock", "Dshrub", "pwetland", "cti", "ffp","gsp")

Log-transform values, set log of negative values or zero (where log is not defined) to 0.00001.

for(i in 1:ncol(s)) 
{
  s[,i] <- ifelse(s[,i] <= 0, 0.00001, log(s[,i]))
}

Site correlations:

p = 0.8 # Set upper limit of for collinearity
site.cor <- cor(s, y = NULL, 
                use = "complete.obs", 
                method = "pearson")
diag(site.cor) <- 0                 

cor.idx <- which(site.cor > p | site.cor < -p, arr.ind = TRUE)
cor.names <- vector()
cor.p <- vector()

for(i in 1:nrow(cor.idx)) 
{
     cor.p[i] <- site.cor[cor.idx[i,][1], cor.idx[i,][2]]
  cor.names [i] <- paste(rownames(site.cor)[cor.idx[i,][1]],
                         colnames(site.cor)[cor.idx[i,][2]], sep="_")
}   
data.frame(parm=cor.names, p=cor.p)

##            parm          p
## 1      ffp_Elev -0.9016287
## 2      gsp_Elev  0.8934291
## 3   Area_Length  0.8715535
## 4  Perim_Length  0.8392532
## 5   Length_Area  0.8715535
## 6    Perim_Area  0.9912114
## 7  Length_Perim  0.8392532
## 8    Area_Perim  0.9912114
## 9      Elev_ffp -0.9016287
## 10      gsp_ffp -0.9656032
## 11     Elev_gsp  0.8934291
## 12      ffp_gsp -0.9656032

This returns a list of pairwise correlations that are than the threshold p. Instead of doing this manually, we can use the function collinear to check this. It will make a suggestions which variable to drop.

node.cor <- spatialEco::collinear(s, p=p)

## Collinearity between Area and Perim correlation = 0.9912

##   Correlation means: 0.446 vs 0.276

##    recommend dropping Area

## Collinearity between Area and Length correlation = 0.8716

##   Correlation means: 0.446 vs 0.278

##    recommend dropping Area

## Collinearity between Perim and Length correlation = 0.8393

##   Correlation means: 0.441 vs 0.278

##    recommend dropping Perim

## Collinearity between ffp and Elev correlation = 0.9016

##   Correlation means: 0.341 vs 0.283

##    recommend dropping ffp

## Collinearity between ffp and gsp correlation = 0.9656

##   Correlation means: 0.341 vs 0.284

##    recommend dropping ffp

## Collinearity between Elev and gsp correlation = 0.8934

##   Correlation means: 0.336 vs 0.284

##    recommend dropping Elev

It also returns a vector of correlated variables:

node.cor

## [1] "Elev"  "Area"  "Perim" "ffp"

h. Add node data

Build and add node (at-site) level data to graph, then merge edge (distance graph) and edge (between-site) data.

node <- build.node.data(st_drop_geometry(sites), group.ids = "SiteID", 
                        from.parms = names(s))

Merge node and edges for model data.frame

gdata <- merge(st_drop_geometry(dist.graph)[c(1,2,5,11,14,7)], node, 
               by.x="SiteID", by.y="SiteID")
gdata <- merge(gdata, st_drop_geometry(dist.graph)[c(11, 8:10, 15:40)], 
                   by.x="SiteID", by.y="SiteID") 
# log transform matrix
for(i in 5:ncol(gdata)) 
{
  gdata[,i] <- ifelse(gdata[,i] <= 0, 0.00001, log(gdata[,i]))
}

7. Gravity model

a. Develop hypothesis

What type of gravity model do you wish to run (production or attraction constraint)? Think about what hypotheses that you want to test. Write out model statements that group parameters into hypotheses and run models. Remember to run a NULL that is just distance.

At-site (node) potential parameters. These are all at-site variables. Remember that we pulled all raster variables. We want to critically think about hypotheses and not use all of these parameters.

degree - graph degree
betweenness - graph betweeness
Elev - elevation (see comments below)
Length - geographic distance
Area - wetland area (field)
Perim - wetland perimeter (field)
Depth - wetland depth (field)- highly correlated with predatory fish presence/abundance
pH - wetland pH (field)
Dforest - distance to forest (field)
Drock - distance to rock (field)
Dshrub - distance to shrub (field)
pwetland - proportion of wetland in X buffer (calculated above)
cti - compound topographic wetness index - steady-state measure of wetness based on topography (raster data)
dd5 - degree days >5 C (sum of temp) - (raster data)
ffp - frost free period (raster data)
gsp - growing season precipitation (raster data)
pratio - ratio of growing season precip to annual precip (raster data) - can indicate amount of snow to rain
hli - heat load index - topographic measure of exposure, related to productivity (ice-off and primary productivity) in this system (raster data)
rough27 - unscale topographic variation at a 27 X 27 (cells) window size (raster data)
ssr - measure of topographic variation at a 27X27 (cells) windo size - for this system pulling out ridgelines (raster data)

NOTE: we are adding elevation here as a covariate. HOWEVER - elevation does not represent ecological processes in and of itself. I strongly encourage using the components (temp, moisture, rainfall, vegetation, accessibility, etc.) directly and not elevation as a surrogate parameter.

Between site (edge) potential parameters include:

cti, dd5, ffp, gsp, pratio, hli, rough27, ssr (min, mean, max, var, median for each)
wet.pct.nlcd - percent of cells that are wetland class

# null model (under Maximum Likelihood) 
( null <- gravity(y = "GDIST", x = c("length"), d = "length", group = "from_ID", 
                  data = gdata, fit.method = "ML") )

## [1] "Running singly-constrained gravity model"

## Gravity model
## 
## Linear mixed-effects model fit by maximum likelihood
##   Data: gdata 
##        AIC      BIC    logLik
##   20554.37 20585.62 -10273.18
## 
## Random effects:
##  Formula: GDIST ~ 1 | from_ID
##         (Intercept)  Residual
## StdDev:   0.2009152 0.4232432
## 
## Fixed effects:  stats::as.formula(paste(paste(y, "~", sep = ""), paste(x, collapse = "+"))) 
##                  Value  Std.Error    DF    t-value p-value
## (Intercept) -1.2019984 0.07467247 18224 -16.096941       0
## length       0.1418157 0.03059252 18224   4.635633       0
##  Correlation: 
##        (Intr)
## length -0.854
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.2285925 -0.5168847  0.1184083  0.6182159  2.4848654 
## 
## Number of Observations: 18252
## Number of Groups: 27

# Fish hypothesis (under Maximum Likelihood) 
( depth <- gravity(y = "GDIST", x = c("length","from.Depth"), d = "length", 
                   group = "from_ID", data = gdata, fit.method = "ML", ln = FALSE) )

## [1] "Running singly-constrained gravity model"

## Gravity model
## 
## Linear mixed-effects model fit by maximum likelihood
##   Data: gdata 
##         AIC       BIC  logLik
##   -13067.22 -13028.16 6538.61
## 
## Random effects:
##  Formula: GDIST ~ 1 | from_ID
##         (Intercept) Residual
## StdDev:  0.08102556 0.168483
## 
## Fixed effects:  stats::as.formula(paste(paste(y, "~", sep = ""), paste(x, collapse = "+"))) 
##                  Value   Std.Error    DF    t-value p-value
## (Intercept) -0.4101360 0.028357541 18224 -14.463032  0.0000
## length      -0.0002202 0.001624886 18224  -0.135530  0.8922
## from.Depth  -0.0200540 0.014527936    25  -1.380377  0.1797
##  Correlation: 
##            (Intr) length
## length     -0.462       
## from.Depth -0.693 -0.002
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.07683320 -0.58959052  0.02162005  0.65913788  3.21875035 
## 
## Number of Observations: 18252
## Number of Groups: 27

# Productivity hypothesis (under Maximum Likelihood) 
#( production <- gravity(y = "GDIST", x = c("length", "from.ffp", "from.hli"), 
#                    d = "length",  group = "from_ID", data = gdata, 
#                    fit.method = "ML", ln = FALSE) )

# Climate hypothesis (under Maximum Likelihood) 
#( climate <- gravity(y = "GDIST", x = c("length", "median.ffp", "median.pratio"), 
#                     d = "length", group = "from_ID", data = gdata, 
#                    fit.method = "ML",  ln = FALSE) )

# Wetlands hypothesis (under Maximum Likelihood) 
( wetlands <- gravity(y = "GDIST", x = c("length", "from.degree", "from.betweenness", "from.pwetland"), 
                      d = "length", group = "from_ID", data = gdata, fit.method = "ML",
                      ln = FALSE) )

## [1] "Running singly-constrained gravity model"

## Gravity model
## 
## Linear mixed-effects model fit by maximum likelihood
##   Data: gdata 
##        AIC       BIC   logLik
##   -13061.9 -13007.22 6537.952
## 
## Random effects:
##  Formula: GDIST ~ 1 | from_ID
##         (Intercept) Residual
## StdDev:  0.08303671 0.168483
## 
## Fixed effects:  stats::as.formula(paste(paste(y, "~", sep = ""), paste(x, collapse = "+"))) 
##                       Value  Std.Error    DF   t-value p-value
## (Intercept)      -0.3844698 0.10679706 18224 -3.600004  0.0003
## length           -0.0002183 0.00162517 18224 -0.134338  0.8931
## from.degree       0.0035277 0.01270448    23  0.277677  0.7837
## from.betweenness -0.0318164 0.05937320    23 -0.535872  0.5972
## from.pwetland     0.0069227 0.01136835    23  0.608941  0.5485
##  Correlation: 
##                  (Intr) length frm.dg frm.bt
## length           -0.126                     
## from.degree      -0.163  0.014              
## from.betweenness -0.782 -0.007 -0.429       
## from.pwetland     0.218  0.004  0.443 -0.286
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.07691616 -0.58884894  0.02146133  0.65939630  3.21854105 
## 
## Number of Observations: 18252
## Number of Groups: 27

# Topography hypothesis (under Maximum Likelihood) 
#( topo <- gravity(y = "GDIST", x = c("length", "median.srr", "median.rough27"), d = "length", 
#                  group = "from_ID", data = gdata, fit.method = "ML",
#                 ln = FALSE) )

# Habitat hypothesis (under Maximum Likelihood) 
( habitat <- gravity(y = "GDIST", x = c("length", "wet.pct.nlcd", "median.gsp"), 
                     d = "length", group = "from_ID", data = gdata, fit.method = "ML",
                     ln = FALSE, method="ML") )

## [1] "Running singly-constrained gravity model"

## Gravity model
## 
## Linear mixed-effects model fit by maximum likelihood
##   Data: gdata 
##         AIC       BIC   logLik
##   -13063.38 -13016.51 6537.689
## 
## Random effects:
##  Formula: GDIST ~ 1 | from_ID
##         (Intercept) Residual
## StdDev:    0.083851 0.168483
## 
## Fixed effects:  stats::as.formula(paste(paste(y, "~", sep = ""), paste(x, collapse = "+"))) 
##                   Value Std.Error    DF    t-value p-value
## (Intercept)  -0.4356243 0.3357628 18222 -1.2974167  0.1945
## length       -0.0002220 0.0016250 18222 -0.1365924  0.8914
## wet.pct.nlcd  0.0000254 0.0008624 18222  0.0294094  0.9765
## median.gsp   -0.0002840 0.0592239 18222 -0.0047961  0.9962
##  Correlation: 
##              (Intr) length wt.pc.
## length       -0.039              
## wet.pct.nlcd -0.129  0.000       
## median.gsp   -0.998  0.000  0.135
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.07738165 -0.58923728  0.02085918  0.65955597  3.21896871 
## 
## Number of Observations: 18252
## Number of Groups: 27

# Global model (under Maximum Likelihood) 

#( global <- gravity(y = "GDIST", x = c("length", "wet.pct.nlcd", "median.gsp", 
#                    "from.Depth", "from.ffp", "from.hli", "from.pratio", "from.degree", 
#                   "from.betweenness", "from.pwetland", "median.srr", "median.rough27"), 
#                   d = "length", group = "from_ID", data = gdata, fit.method = "ML",
#                   ln = FALSE) )
( global <- gravity(y = "GDIST", x = c("length", "wet.pct.nlcd", "median.gsp", 
                    "from.Depth", "from.ffp",  
                    "from.betweenness", "from.pwetland"), 
                    d = "length", group = "from_ID", data = gdata, fit.method = "ML",
                    ln = FALSE) )

## [1] "Running singly-constrained gravity model"

## Gravity model
## 
## Linear mixed-effects model fit by maximum likelihood
##   Data: gdata 
##         AIC       BIC   logLik
##   -13058.71 -12980.59 6539.355
## 
## Random effects:
##  Formula: GDIST ~ 1 | from_ID
##         (Intercept) Residual
## StdDev:   0.0788026 0.168483
## 
## Fixed effects:  stats::as.formula(paste(paste(y, "~", sep = ""), paste(x, collapse = "+"))) 
##                       Value Std.Error    DF    t-value p-value
## (Intercept)      -0.5297034 0.3719807 18222 -1.4240077  0.1545
## length           -0.0002215 0.0016251 18222 -0.1363155  0.8916
## wet.pct.nlcd      0.0000281 0.0008625 18222  0.0325393  0.9740
## median.gsp        0.0013770 0.0592685 18222  0.0232332  0.9815
## from.Depth       -0.0307357 0.0179440    22 -1.7128674  0.1008
## from.ffp          0.0442308 0.0548190    22  0.8068514  0.4284
## from.betweenness  0.0291035 0.0606375    22  0.4799595  0.6360
## from.pwetland     0.0071061 0.0099232    22  0.7161115  0.4815
##  Correlation: 
##                  (Intr) length wt.pc. mdn.gs frm.Dp frm.ff frm.bt
## length           -0.036                                          
## wet.pct.nlcd     -0.117  0.000                                   
## median.gsp       -0.909  0.000  0.135                            
## from.Depth        0.176 -0.002  0.001 -0.004                     
## from.ffp         -0.340  0.004  0.003  0.038 -0.288              
## from.betweenness -0.273  0.000  0.001 -0.012 -0.536  0.068       
## from.pwetland     0.112 -0.004  0.001  0.001 -0.128 -0.137 -0.012
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.07768386 -0.58928064  0.02237781  0.65817829  3.21958216 
## 
## Number of Observations: 18252
## Number of Groups: 27

b. Compare competing models

Should you use ML or REML? Create diagnostic plots.

Can you directly compare ML and REML? Why not?

#compare.models(null, depth, production, climate, wetlands, 
#               topo, habitat, global)
compare.models(null, depth, wetlands, 
               habitat, global)

##      model       AIC       BIC log.likelihood   RMSE nparms fit.method
## 1     null  20554.37  20585.62     -10273.185 0.2894      2         ML
## 2    depth -13067.22 -13028.16       6538.610 0.1213      3         ML
## 3 wetlands -13061.90 -13007.22       6537.952 0.1217      5         ML
## 4  habitat -13063.38 -13016.51       6537.689 0.1220      4         ML
## 5   global -13058.71 -12980.59       6539.355 0.1207      8         ML
##       deltaAIC    deltaBIC
## 1 33621.588595 33613.77657
## 2     0.000000     0.00000
## 3     5.315719    20.93978
## 4     3.840564    11.65259
## 5     8.509063    47.56921

par(mfrow=c(2,3))
   for (i in 1:6) { plot(null, type=i) }

c. Fit final model(s) and calculate effect size

To calculate effect size, we refit the models with REML.

# productivity fit (under REML)
#h <- c("length", "from.ffp", "from.hli")
#production_fit <- gravity(y = "GDIST", x = h, d = "length", group = "from_ID",
#                      data = gdata, ln=FALSE)

## g.obal fit (under REML)
#g <-  c("length", "wet.pct.nlcd", "median.gsp", "from.Depth", "from.ffp",
#        "from.hli", "from.pratio",  "from.degree", "from.betweenness",  
#        "from.pwetland", "median.srr",  "median.rough27")
g <-  c("length", "wet.pct.nlcd", "median.gsp", "from.Depth", "from.ffp",
        "from.betweenness",  "from.pwetland")
global_fit <- gravity(y = "GDIST", x = g, d = "length", 
                      group = "from_ID", data = gdata, ln=FALSE)

## [1] "Running singly-constrained gravity model"

Effect size calculation: global model

gravity.es(global_fit)

##                      t.value    df       cohen.d  p.value      low.ci
## length           -0.13233104 18222 -0.0019606212 0.894724 -0.02249547
## wet.pct.nlcd      0.02652092 18222  0.0003929348 0.978842 -0.02014191
## median.gsp        0.01893690 18222  0.0002805698 0.984892 -0.02025427
## from.Depth       -1.54640025    22 -0.6593872813 0.136273 -1.29966538
## from.ffp          0.72841095    22  0.3105954724 0.474045 -0.31611957
## from.betweenness  0.43335537    22  0.1847833506 0.668976 -0.43940100
## from.pwetland     0.64649121    22  0.2756647800 0.524649 -0.35021986
##                        up.ci
## length            0.01857423
## wet.pct.nlcd      0.02092778
## median.gsp        0.02081541
## from.Depth       -0.01910919
## from.ffp          0.93731051
## from.betweenness  0.80896770
## from.pwetland     0.90154942

par(mfrow=c(2,3))
   for (i in 1:6) { plot(global_fit, type=i) }

Production model:

#gravity.es(production_fit)
#par(mfrow=c(2,3))
#   for (i in 1:6) { plot(production_fit, type=i) }

d. Back predict global_fit model

Feel free to add other top models.

gd <- back.transform(gdata$GDIST)

# Make individual-level (group) predictions (per slope) and show RMSE 
global.p <- predict(global_fit, y = "GDIST", x = g,  
                    newdata=gdata, groups = gdata$from_ID,
                    back.transform = "simple")

## Making individual-level (per-slope group) 
##        constrained predictions

## Back-transforming exp(y-hat)*0.5*variance(y-hat), 
##        assumes normally distributed errors

#production.p <- predict(production_fit, y = "GDIST", x = h,  
#                     newdata=gdata, groups = gdata$from_ID,
#                    back.transform = "simple")

cat("RMSE of global", rmse(global.p, gd), "\n")

## RMSE of global 0.1104107

#cat("RMSE of production", rmse(production.p, gd), "\n")

e. Aggregrate estimates and plot

We can aggregrate estimates back to the edges and nodes. An interactive map can be created using the tmap package (see Week 2 Bonus vignette).

Aggregate estimates to graph and node.

global.p <- data.frame(EID = gdata$from.to,
                       NID = gdata$from_ID,  
                       p=global.p)

edge.p <- tapply(global.p$p, global.p$EID, mean)
  dist.graph$global.flow <- edge.p
  
node.p <- tapply(global.p$p, global.p$NID, mean)
node.var <- tapply(global.p$p, global.p$NID, var)

idx <- which(sites$SiteName %in% names(node.p))
sites$global.flow[idx] <- node.p
sites$global.var[idx] <- node.var

Define the map and store it in object Map.

pal <- colorRampPalette(rev(c("red","orange","blue")), bias=0.15)

Map <- 
  tm_shape(dist.graph) +
  tm_lines("global.flow", palette=pal(10), title.col="Edges: global.flow") +
  tm_shape(sites) +
  tm_symbols(col = "global.flow", size = "global.var", 
               shape = 20, scale = 0.75, palette=pal(10), 
             title.col="Nodes: global.flow", title.size="Nodes: global.var")

Plot static map: here we tell R to plot the legend outside of the map.

tmap_mode(c("plot", "view")[1])

## tmap mode set to plotting

Map + tm_layout(legend.outside=TRUE, legend.position = c("right", "top"))

## Legend labels were too wide. Therefore, legend.text.size has been set to 0.49. Increase legend.width (argument of tm_layout) to make the legend wider and therefore the labels larger.

Plot interactive map: this may not render e.g. in Bookdown or when you knit your notebook to output formats other than .html.

#tmap_mode(c("plot", "view")[2])
#Map