20 Day 20 (April 3)

20.1 Announcements

  • Remember that next lecture on Tuesday is a work day.

  • April check-in is worth 5% of your grade. Please send email to both Aidan and I to plan your April check in.

  • Selected questions/clarifications from journals

    • Where is the line between “statistical models” and “machine learning models”?
      • Review expoential growth model from last lecture and why this is statistics
      • Machine learning applied to whooping crane data (Download here)
      • MS defense in Room 302 Dickens Hall Tomorrow
    • Project related question. Build my own software for model fitting, use a specialized R package (like TGP), or use something like JAGs?

20.2 Data fusion

  • See Ch. 25 in BBM2L
  • Formative story (math and people working together) and publication
  • Many different names
    • Integrated modeling
    • Data reconciliation
    • Data fusion
  • Why use Bayesian statistics for data fusion?
    • The probability someone has crafted the model you need (and criticism of my own work)?
    • Use of the hierarchical modeling framework
    • Recursive use of Bayes theorem (see here)
  • Example where it is time consuming to get precise data
    • Age vs. height
    • Abundance vs. presence/absence data
    • Disease status of plants (see pgs 407-424 in BBM2L)
    • Percent cover data
  • Ad-hoc approaches
    • Simple pooling
    • Transform high quality into low quality data and then pool
  • Example: Konza percent grass cover
    • Exact cover data
    url <- "https://www.dropbox.com/s/8ohtahx99jox9a5/konza_grass_exact.csv?dl=1"
    df.grass.exact <- read.csv(url)
    head(df.grass.exact)
    ##   percgrass    elev
    ## 1        13 420.860
    ## 2        45 399.306
    ## 3        72 412.792
    ## 4         9 422.606
    ## 5        21 381.152
    ## 6        37 397.789
    plot(df.grass.exact$elev,df.grass.exact$percgrass,xlab="Elevation",ylab="Percent grass")
    • Cheap cover data
    url <- "https://www.dropbox.com/s/eef5hy8geyi73ke/konza_grass_cheap.csv?dl=1"
    df.grass.cheap <- read.csv(url)
    head(df.grass.cheap)
    ##   percgrass    elev
    ## 1      <50% 430.380
    ## 2      <50% 430.380
    ## 3      <50% 430.380
    ## 4      <50% 430.380
    ## 5      <50% 425.015
    ## 6      <50% 425.015
    • Model formulation (go over on white board)
    • Model implementation (go over on white board)
    • Live example (Download here)