5 Data Transformation

Figure 5.1: Visualization made possible by pivoted data
The goal of this section is to continue where we started in the earlier chapter on data abstraction with dplyr
to look at the more transformational functions applied to data in a database, and tidyr
adds other tools like pivot tables.
dplyr
tools:- joins:
left_join
,right_join
,inner_join
,full_join
,semi_join
,anti_join
- set operations:
intersect
,union
,setdiff
- binding rows and columns:
bind_cols
,bind_rows
- joins:
tidyr
tools:- pivot tables:
pivot_longer
,pivot_wider
- pivot tables:
The term “data wrangling” has been used for what we’re doing with these tools, and the relevant cheat sheet is actually called “Data Wrangling” https://rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
5.1 Data joins
To bring in variables from another data frame based on a common join field. There are multiple types of joins. Probably the most common is left_join
since it starts from the data frame (or sf) you want to continue working with and bring in data from an additional source. You’ll retain all records of the first data set. For any non-matches, NA
is assigned.
library(tidyverse)
library(igisci)
library(sf)
<- read_csv(ex("CA/CA_MdInc.csv")) %>%
income ::select(trID, HHinc2016) %>%
dplyrmutate(HHinc2016 = as.numeric(HHinc2016),
joinid = str_c("0", trID)) %>%
::select(joinid, HHinc2016)
dplyr<- BayAreaTracts %>%
census left_join(income, by = c("FIPS" = "joinid")) %>%
::select(FIPS, POP12_SQMI, POP2012, HHinc2016)
dplyrhead(census %>% st_set_geometry(NULL))
## FIPS POP12_SQMI POP2012 HHinc2016
## 1 06001400100 1118.797 2976 177417
## 2 06001400200 9130.435 2100 153125
## 3 06001400300 11440.476 4805 85313
## 4 06001400400 14573.077 3789 99539
## 5 06001400500 15582.609 3584 83650
## 6 06001400600 13516.667 1622 61597
Other joins are:
right_join
where you end up retaining all the rows of the second data set and NA is assigned to non-matchesinner_join
where you only retain records for matchesfull_join
where records are retained for both sides, and NAs assigned to non-matches
Right join example We need to join NCDC monthly climate data for all California weather stations to a selection of 82 stations that are in the Sierra.
- The monthly data has 12 rows (1/month) for each station
- The right_join gets all months for all stations, so we weed out the non-Sierra stations by removing NAs from a field only with Sierra station data
<- right_join(sierraStations, CA_ClimateNormals, by="STATION") %>%
sierra filter(!is.na(STATION_NA)) %>% dplyr::select(-STATION_NA)
head(sierra %>% filter(DATE == "01") %>% dplyr::select(NAME, ELEVATION, `MLY-TAVG-NORMAL`), n=10)
## # A tibble: 10 x 3
## NAME ELEVATION `MLY-TAVG-NORMAL`
## <chr> <dbl> <dbl>
## 1 GROVELAND 2, CA US 853. 5.6
## 2 CANYON DAM, CA US 1390. 0.2
## 3 KERN RIVER PH 3, CA US 824. 7.6
## 4 DONNER MEMORIAL ST PARK, CA US 1810. -1.9
## 5 BOWMAN DAM, CA US 1641. 3
## 6 BRUSH CREEK RANGER STATION, CA US 1085. NA
## 7 GRANT GROVE, CA US 2012. 1.9
## 8 LEE VINING, CA US 2072. -1.2
## 9 OROVILLE MUNICIPAL AIRPORT, CA US 57.9 7.7
## 10 LEMON COVE, CA US 156. 8.6
The exact same thing however could be accomplished with an inner_join and it doesn’t required removing the NAs:
<- inner_join(sierraStations, CA_ClimateNormals, by="STATION") %>%
sierraAlso ::select(-STATION_NA) dplyr
5.2 Set Operations
Set operations compare two data frames (or vectors) to handle observations or rows that are the same for each, or not the same. The three set methods are:
dplyr::intersect(x,y)
retains rows that appear in both x and ydplyr::union(x,y)
retains rows that appear in either or both of x and ydplyr::setdiff(x,y)
retains rows that appear in x but not in y
<- (1:10)^2
squares <- seq(0,100,2)
evens squares
## [1] 1 4 9 16 25 36 49 64 81 100
evens
## [1] 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
## [20] 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74
## [39] 76 78 80 82 84 86 88 90 92 94 96 98 100
intersect(squares,evens)
## [1] 4 16 36 64 100
sort(union(squares,evens))
## [1] 0 1 2 4 6 8 9 10 12 14 16 18 20 22 24 25 26 28 30
## [20] 32 34 36 38 40 42 44 46 48 49 50 52 54 56 58 60 62 64 66
## [39] 68 70 72 74 76 78 80 81 82 84 86 88 90 92 94 96 98 100
sort(setdiff(squares,evens))
## [1] 1 9 25 49 81
5.3 Binding Rows and Columns
These dplyr
functions are similar to cbind
and rbind
in base R, but always creates data frames. For instance, cbind
usually creates matrices, and make all vectors the same class. Note that in bind_cols
, the order of data in rows must be the same.
<- bind_cols(abb=state.abb,
states name=state.name,
region=state.region,
state.x77)head(states)
## # A tibble: 6 x 11
## abb name region Population Income Illiteracy `Life Exp` Murder `HS Grad`
## <chr> <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AL Alabama South 3615 3624 2.1 69.0 15.1 41.3
## 2 AK Alaska West 365 6315 1.5 69.3 11.3 66.7
## 3 AZ Arizona West 2212 4530 1.8 70.6 7.8 58.1
## 4 AR Arkansas South 2110 3378 1.9 70.7 10.1 39.9
## 5 CA Califor~ West 21198 5114 1.1 71.7 10.3 62.6
## 6 CO Colorado West 2541 4884 0.7 72.1 6.8 63.9
## # ... with 2 more variables: Frost <dbl>, Area <dbl>
To compare, note that cbind
converts numeric fields to character when any other field is character, and character fields are converted to character integers where there are any repeats, which would require manipulating them into factors:
<- as_tibble(cbind(abb=state.abb,
states name=state.name,
region=state.region,
division=state.division,
state.x77))head(states)
## # A tibble: 6 x 12
## abb name region division Population Income Illiteracy `Life Exp` Murder
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 AL Alabama 2 4 3615 3624 2.1 69.05 15.1
## 2 AK Alaska 4 9 365 6315 1.5 69.31 11.3
## 3 AZ Arizona 4 8 2212 4530 1.8 70.55 7.8
## 4 AR Arkansas 2 5 2110 3378 1.9 70.66 10.1
## 5 CA Californ~ 4 9 21198 5114 1.1 71.71 10.3
## 6 CO Colorado 4 8 2541 4884 0.7 72.06 6.8
## # ... with 3 more variables: HS Grad <chr>, Frost <chr>, Area <chr>
5.4 Pivotting data frames
Pivot tables are a popular tool in Excel, allowing you to transform your data to be more useful in a particular analysis. A common need to pivot is 2+ variables with the same data where the variable name should be a factor. Tidyr
has pivot_wider
and pivot_longer
.
pivot_wider
pivots rows into variables.pivot_longer
pivots variables into rows, creating factors.
In our meadows study cross-section (J. D. Davis et al. 2020) created by intersecting normalized difference vegetation index (NDVI) values from multispectral drone imagery with surveyed elevation and vegetation types (xeric, mesic, and hydric), we have fields NDVIgrowing
from a July 2019 growing season and NDVIsenescent
from a September 2020 dry season, but would like ‘growing’ and ‘senescent’ to be factors with a single NDVI
variable. This is how we used pivot_longer
to accomplish this, using data from the igisci
data package:
<- XSptsNDVI %>%
XSptsPheno filter(vegetation != "pine") %>% # trees removed
pivot_longer(cols = starts_with("NDVI"),
names_to = "phenology", values_to = "NDVI") %>%
mutate(phenology = str_sub(phenology, 5, str_length(phenology)))
Then to do the opposite use pivot_wider
:
%>%
XSptsPheno pivot_wider(names_from = phenology, names_prefix = "NDVI",
values_from = NDVI)
## # A tibble: 19 x 6
## DistNtoS elevation vegetation geometry NDVIgrowing NDVIsenescent
## <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 0 1510. Artemisia c(718649.456, 4~ 0.326 0.140
## 2 16.7 1510. mixed graminoid c(718649.4309, ~ 0.627 0.259
## 3 28.6 1510. mixed graminoid c(718649.413, 4~ 0.686 0.329
## 4 30.5 1510. mixed graminoid c(718649.4101, ~ 0.668 0.282
## 5 31.1 1510. mixed graminoid c(718649.4092, ~ 0.655 0.266
## 6 33.4 1510. mixed graminoid c(718649.4058, ~ 0.617 0.274
## 7 35.6 1510. mixed graminoid c(718649.4025, ~ 0.623 0.275
## 8 37 1510. mixed graminoid c(718649.4004, ~ 0.589 0.242
## 9 74 1510. mixed graminoid c(718649.3448, ~ 0.641 0.325
## 10 101 1510. mixed graminoid c(718649.3042, ~ 0.558 0.312
## 11 126. 1511. Artemisia c(718649.2672, ~ 0.391 0.163
## 12 137 1510. mixed graminoid c(718649.25, 43~ 0.583 0.272
## 13 149. 1510. Carex c(718649.2317, ~ 0.736 0.392
## 14 154. 1510. Carex c(718649.224, 4~ 0.744 0.424
## 15 175 1511. mixed graminoid c(718649.1929, ~ 0.455 0.204
## 16 195. 1511. mixed graminoid c(718649.1633, ~ 0.512 0.237
## 17 197. 1510. mixed graminoid c(718649.1597, ~ 0.489 0.197
## 18 201. 1511. mixed graminoid c(718649.1544, ~ 0.533 0.275
## 19 259. 1511. mixed graminoid c(718649.0663, ~ 0.505 0.193
XSptsPheno
## # A tibble: 38 x 6
## DistNtoS elevation vegetation geometry phenology NDVI
## <dbl> <dbl> <chr> <chr> <chr> <dbl>
## 1 0 1510. Artemisia c(718649.456, 4397466.714) growing 0.326
## 2 0 1510. Artemisia c(718649.456, 4397466.714) senescent 0.140
## 3 16.7 1510. mixed graminoid c(718649.4309, 4397450.07~ growing 0.627
## 4 16.7 1510. mixed graminoid c(718649.4309, 4397450.07~ senescent 0.259
## 5 28.6 1510. mixed graminoid c(718649.413, 4397438.222) growing 0.686
## 6 28.6 1510. mixed graminoid c(718649.413, 4397438.222) senescent 0.329
## 7 30.5 1510. mixed graminoid c(718649.4101, 4397436.33) growing 0.668
## 8 30.5 1510. mixed graminoid c(718649.4101, 4397436.33) senescent 0.282
## 9 31.1 1510. mixed graminoid c(718649.4092, 4397435.73~ growing 0.655
## 10 31.1 1510. mixed graminoid c(718649.4092, 4397435.73~ senescent 0.266
## # ... with 28 more rows
%>%
XSptsPheno ggplot() +
geom_point(aes(elevation, NDVI, shape=vegetation,
color = phenology), size = 5) +
geom_smooth(aes(elevation, NDVI,
color = phenology), method="lm")

Figure 5.2: Color classified by phenology, created by pivot_longer
Pivots turn out to be commonly useful. Runoff graphing from the Eucalyptus/Oak study (Thompson, Davis, and Oliphant 2016) also benefited from a pivot_longer:
%>%
eucoakrainfallrunoffTDR pivot_longer(cols = starts_with("runoffL"),
names_to = "tree", values_to = "runoffL") %>%
mutate(tree = str_sub(tree, str_length(tree)-2, str_length(tree))) %>%
ggplot() + geom_boxplot(aes(site, runoffL)) +
facet_grid(tree ~ .)

Figure 5.3: Euc vs oak graphs created using pivot_longer
Combining a pivot with bind_rows to create a runoff/rainfall scatterplot colored by tree
<- eucoakrainfallrunoffTDR %>%
runoffPivot pivot_longer(cols = starts_with("runoffL"),
names_to = "tree", values_to = "runoffL") %>%
mutate(tree = str_sub(tree, str_length(tree)-2, str_length(tree)),
Date = as.Date(date, "%m/%d/%Y"))
<- runoffPivot %>%
euc filter(tree == "euc") %>%
mutate(rain_subcanopy = rain_euc,
slope = slope_euc, aspect = aspect_euc,
surface_tension = surface_tension_euc,
runoff_rainfall_ratio = runoff_rainfall_ratio_euc) %>%
::select(site, `site #`, tree, Date, month, rain_mm,
dplyr
rain_subcanopy, slope, aspect, runoffL,
surface_tension, runoff_rainfall_ratio)<- runoffPivot %>%
oak filter(tree == "oak") %>%
mutate(rain_subcanopy = rain_oak,
slope = slope_oak, aspect = aspect_oak,
surface_tension = surface_tension_oak,
runoff_rainfall_ratio = runoff_rainfall_ratio_oak) %>%
::select(site, `site #`, tree, Date, month, rain_mm,
dplyr
rain_subcanopy, slope, aspect, runoffL,
surface_tension, runoff_rainfall_ratio)bind_rows(euc, oak) %>%
ggplot() +
geom_point(mapping = aes(x = rain_mm, y = runoffL, color = tree)) +
geom_smooth(mapping = aes(x = rain_mm, y= runoffL, color = tree),
method = "lm") +
scale_color_manual(values = c("seagreen4", "orange3"))

Figure 5.4: Runoff/rainfall scatterplot colored by tree, created by pivot and bind_rows
5.4.1 pivot_wider
The opposite of pivot_longer
, pivot_wider
is less commonly used for tidying data, but can useful for creating tables of that desired format. An environmental application of pivot_wider
can be found in vignette("pivot")
(modified below) for studying fish detected by automatic monitors, with fish_encounters
data contributed by Johnston and Rudis (n.d.) . This pivot makes it easier to see fish encounters by station. See Johnston and Rudis (n.d.) and vignette("pivot")
for more information on the dataset and how the pivot provides this view.
Note: The file “fishdata.csv” is already in the igisci package in the extdata folder, so it will already exist. I did this just in case the folks who provided the data don’t keep the resource available. Otherwise all the references to ex("fishdata.csv")
could just be "fishdata.csv"
. I guess I could have just skipped all of the code that accesses the data online, but I thought it would be useful to see this as an example.
library(tidyverse)
if (!file.exists(ex("fishdata.csv"))) {
download.file(
url = 'https://github.com/Myfanwy/ReproducibleExamples/raw/master/encounterhistories/fishdata.csv',
destfile = ex("fishdata.csv")
)
}<- read_csv(ex("fishdata.csv"))
fish_encounters fish_encounters
## # A tibble: 209 x 3
## TagID Station value
## <dbl> <chr> <dbl>
## 1 4842 Release 1
## 2 4843 Release 1
## 3 4844 Release 1
## 4 4845 Release 1
## 5 4847 Release 1
## 6 4848 Release 1
## 7 4849 Release 1
## 8 4850 Release 1
## 9 4851 Release 1
## 10 4854 Release 1
## # ... with 199 more rows
<- fish_encounters %>%
fishEncountersWide pivot_wider(names_from = Station, values_from = value, values_fill = 0)
fishEncountersWide
## # A tibble: 19 x 12
## TagID Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 4842 1 1 1 1 1 1 1 1 1 1 1
## 2 4843 1 1 1 1 1 1 1 1 1 1 1
## 3 4844 1 1 1 1 1 1 1 1 1 1 1
## 4 4845 1 1 1 1 1 0 0 0 0 0 0
## 5 4847 1 1 1 0 0 0 0 0 0 0 0
## 6 4848 1 1 1 1 0 0 0 0 0 0 0
## 7 4849 1 1 0 0 0 0 0 0 0 0 0
## 8 4850 1 1 0 1 1 1 1 0 0 0 0
## 9 4851 1 1 0 0 0 0 0 0 0 0 0
## 10 4854 1 1 0 0 0 0 0 0 0 0 0
## 11 4855 1 1 1 1 1 0 0 0 0 0 0
## 12 4857 1 1 1 1 1 1 1 1 1 0 0
## 13 4858 1 1 1 1 1 1 1 1 1 1 1
## 14 4859 1 1 1 1 1 0 0 0 0 0 0
## 15 4861 1 1 1 1 1 1 1 1 1 1 1
## 16 4862 1 1 1 1 1 1 1 1 1 0 0
## 17 4863 1 1 0 0 0 0 0 0 0 0 0
## 18 4864 1 1 0 0 0 0 0 0 0 0 0
## 19 4865 1 1 1 0 0 0 0 0 0 0 0
5.5 A faceted graph using a pivot
Creating parallel multi-parameter graphs over a time series can be challenging. We need to link the graphs with a common x axis, but may need to vary the scaling on the y axis, unlike the faceted graph of grouped data we looked at in the visualization chapter. We’ll look at time series in a later chapter, but this is a good time to explore the use of a pivot_longer to set up the data for a graph like this, and at the same time to expand upon our visualization toolkit.
Flux tower data
We’ll look at this data set in more depth in the time series chapter, but here’s a quick introduction. To look at micrometeorological changes related to phenological changes in vegetation from seasonal hydrologic changes from snowmelt through summer drying, we captured an array of variables at a flux tower in Loney Meadow in the South Yuba River watershed of the northern Sierra during the 2016 season.

Figure 5.5: Flux tower installed at Loney Meadow, 2016. Photo credit: Darren Blackburn
A spreadsheet of 30-minute summaries from 17 May to 6 September can be found in the igisci extdata folder as “meadows/LoneyMeadow_30minCO2fluxes_Geog604.xls,” and among other parameters includes data on CO2 flux, net radiation (Qnet), air temperature (Tair), and relative humidity (RH). There’s clearly a lot more we can do with these data (see Blackburn, Oliphant, and Davis (2021)), but we’ll look at this selection of parameters to see how we can use a pivot to create a multi-parameter graph.
First we’ll read in the data and simplify some of the variables. Since the second row of data contains measurement units, we needed to wrangle the data a bit to capture the variable names then add those back after removing the first two rows:
library(readxl); library(tidyverse); library(lubridate)
<- read_xls(ex("meadows/LoneyMeadow_30minCO2fluxes_Geog604.xls"), n_max=0) %>%
vnames names()
<- read_xls(ex("meadows/LoneyMeadow_30minCO2fluxes_Geog604.xls"), n_max=0, skip=1) %>%
vunits names()
<- read_xls(ex("meadows/LoneyMeadow_30minCO2fluxes_Geog604.xls"), skip=2, col_names=vnames) %>%
Loney rename(YDay = `Day of Year`, CO2flux = `CO2 Flux`)
The time unit we’ll want to use for time series is going to be days, and we can also then look at the data over time, and a group_by summarization by days will give us a generalized picture of changes over the collection period reflecting phenological changes from first exposure after snowmelt through the maximum growth period and through the major senescence period of late summer. We’ll create the data to graph by using a group_by summarize to create a daily picture of a selection of 4 micrometeorological parameters:
<- Loney %>%
LoneyDaily group_by(YDay) %>%
summarize(CO2flux = mean(CO2flux),
Qnet = mean(Qnet),
Tair = mean(Tair),
RH = mean(RH))
Then from this daily data frame, we pivot_longer to store all of the parameter names in a parameter
variable and all parameter values in a value
variable:
<- LoneyDaily %>%
LoneyDailyLong pivot_longer(cols = CO2flux:RH,
names_to="parameter",
values_to="value") %>%
filter(parameter %in% c("CO2flux", "Qnet", "Tair", "RH"))
Now we have what we need to create a multi-parameter graph using facet_grid – note the scales = “free_y” setting to allow each variable’s y axis to correspond to its value range:
<- ggplot(data = LoneyDailyLong, aes(x=YDay, y=value)) +
p geom_line()
+ facet_grid(parameter ~ ., scales = "free_y") p

Figure 5.6: faced_grid with free_y setting, supported by pivot
The need to create a similar graph from soil CO2 data inspired me years ago to learn how to program Excel. It was otherwise impossible to get Excel to make all the x scales the same, so I learned how to force it with code…
5.6 Data wrangling exercise
by Josh von Nonn (2021)
The impetus behind this exercise came from the movie Dark Waters (https://www.youtube.com/watch?v=RvAOuhyunhY), inspired by a true story of negligent chemical waste disposal by Dupont.
First create a new RStudio project, named GAMA
and save this .Rmd file there, and create a folder GAMA_water_data
in the project folder; the path to the data can then be specified as "GAMA_water_data/gama_pfas_statewide_v2.txt"
assuming that the latter name matches what is unzipped from your download. Change to match the actual name if it differs.
Then download from the California Water Boards, GAMA groundwater website. https://gamagroundwater.waterboards.ca.gov/gama/datadownload
Then select “Statewide Downloads” then “Statewide PFOS Data” and extract the zip file into the GAMA_water_data
folder. This is a large txt file so if you open it with notepad it may take some time to load. Notice that this is a space delimited file.
Note that this data structure is similar to what was discussed in the introductory chapter in how you should use RStudio projects to organize your data, allowing relative paths to your data, such as “GAMA_water_data/GAMA_water_data/gama_pfas_statewide_v2.txt” which will work wherever you copy your project folder. An absolute path to somewhere on your computer in contrast won’t work for anyone else trying to run your code; absolute paths should only be used for servers that other users have access to and URLs on the web.
Required packages:
library(tidyverse)
library(lubridate)
Read in the downloaded file
"gama_pfas_statewide_v2.txt"
and call itcal_pfas
and have a look at the data set. You can select if from the Environment pane or useview(cal_pfas)
.Before we clean up this data, let’s preserve the locations of all the wells. Create a new data frame,
cal_pfas_loc
, SelectGM_WELL_ID
,GM_LATITUDE
, andGM_LONGITUDE
and remove all the duplicate wells (hint: dplyr cheat sheet provides a function to do this).Now to trim down the data. Create a new data frame,
cal_pfas_trim
; add a new column,DATE
, using the associatedlubridate
function onGM_SAMP_COLLECTION_DATE
(this will allow ggplot to recognize it as a date and not a character string), selectGM_WELL_ID
,GM_CHEMICAL_VVL
,GM_RESULT
, and the newly createdDATE
. Sort (arrange
) the data byGM_WELL_ID
.Use
pivot_wider
to create new columns from the chemical names and values from the result column and store the new data frame ascal_pfas_wide
. Notice the warnings. Some of the wells have multiple samples on the same day so they will be put into a vector (ex. c(6.8,9,4.3,etc..)). Rerun the pivot but include the argument-values_fn = mean
. This will return only the average of all the samples. Once the pivot is working correctly, keep the columnsGM_WELL_ID
,DATE
,PFOS
,PFOA
and pipe a mutate to create a new column,SUMPFS
, that is the sum ofPFOS
andPFOA
.
The US EPA has established a lifetime Health Advisory Level (HAL) for PFOA and PFOS of 70 ng/L. When both PFOA and PFOS are found in drinking water, the combined concentrations of PFOA and PFOS should be compared with the 70 ng/L HAL.
from the GROUNDWATER INFORMATION SHEET for PFOA (website:https://www.waterboards.ca.gov/water_issues/programs/gama/factsheets.html)
For the sake of creating an interesting time series plot, let’s filter data for wells that have a
SUMPFS
greater than 70 and that have more than 10 sampling dates. Start by creating a new data frame-cal_pfas_index
from the pivoted data frame. Hint: one way to do this is usegroup_by
,filter
,count
, andfilter
again. The resulting data frame should have 11 observations and 2 variables.Create a new data frame,
cal_pfas_max
to locate the well with the most sampling dates (n). The data wrangling function from base R,subset
, can do this using themax
function as an argument.Now let’s pull all the data on that well by joining the max indexed well with
cal_pfas_wide
and call the new data framecal_pfas_join
. Remove the “n” column using the select function.Create a new data frame,
cal_pfs_long
andpivot_longer
thecal_pfs_join
data, creating new columns:"chemical"
and"ngl"
.Plot the well using the wide data from
cal_pfs_join
with ggplot, usingDATE
on the x axis and plot the three variables (PFOS,PFOA,SUMPFS
) with different colors of your choice. Add a horizontal reference line (geom_hline(yintercept = 70)
) for the HAL limit at 70.

Figure 5.7: Goal for Question 9
- Plot the well using the long data from
cal_pfs_long
using DATE on the x axis andngl
on the y axis. Add the horizontal reference at 70.

Figure 5.8: Goal for Question 10