13 Reshaping/ restructuring repeated measures/ clustered datasets

Data Wrangling Recipes in R: Hilary Watt

“Fabulous… practical guidance. Makes the hard work of restructuring and preparing data significantly easier … I haven’t seen this focused content presented in one place in such an easy to access form before.” Dr Thomas Woodcock, senior research fellow, School of Public Health.

13.1 Long and wide data formats and basic commands

Both formats are really helpful. For any given analysis/ table/ graph, you will prefer/ require a specific format. For an analysis project with clustered data, you will probably require both formats. Switching between the two is relatively straight-forward when you know how. It takes some effort initially to figure it out.

One key purpose of this introductory section is to illustrate what the command can do. The other key purpose is to illustrate the command, with a simple example. If your example isn’t so simple, you need to pay attention to later sections.

These require library(tidyverse) – more specifically library(tidyr).

People who don’t know these commands can take hours/ days/ weeks doing tasks which can be done quickly. Without some prior knowledge/ knowing command names, googling can be challenging. These commands are useful when you have repeated measurements on the same individuals, or perhaps different measurements taken on different members of the household, or other types of cluster. They will move between the two formats illustrated here:

Here we have cat litters, with kittens weights by birth order (for 1st, 2nd, 3rd, 4th born kittens).

Long format of data (several lines per cluster – here cluster = cat litter):

print(kits)

Show entries

Search:

	catlitterno	catmum_name	kit_no	weight
1	1	Ginger	1	97
2	1	Ginger	2	98
3	1	Ginger	3	104
4	1	Ginger	4	107
5	2	Snowy	1	84
6	2	Snowy	2	87
7	2	Snowy	3	101
8	3	Emily	1	110
9	3	Emily	2	113
10	3	Emily	3	132

Showing 1 to 10 of 10 entries

Previous1Next

Restructuring from long to wider format:
The following command says: take data-frame kits –save result to kitten data-frame/ tibble

takes data from variable named kit_no to become (part of) variable names in new wider version of dataset
prefix the variables names by kit
Take values for new variable kit from values in weight variable

# take dataframe kits and pivot wider
kitten <- pivot_wider(kits, # name data-frame to be restructured - save result to kitten
            names_from = kit_no, # take data from kitno to become part of new variable names
            names_prefix = "kit", # new column names start with kit
            values_from = weight) # data from weight goes into newly created columns
# The names_prefix option is required since "kit_no" does not contain valid 
# variable names

This data is now in wide format (one version of variable(s) for each member of the cluster): Now, we have the same data, but with one line of data per litter/ cat family, with 4 weight of kitten variables, representing weight of kitten1, kitten2, kitten3, kitten4 (if any). The cat’s mum’s name is also given.

Wide format

print(kitten)

Show entries

Search:

	catlitterno	catmum_name	kit1	kit2	kit3	kit4
1	1	Ginger	97	98	104	107
2	2	Snowy	84	87	101
3	3	Emily	110	113	132

Showing 1 to 3 of 3 entries

Previous1Next

Restructuring from wide to longer format:
Now, we are translating kitten back to long format (& saving to data-frame/ tibblekits). The variables starting with kit will be restructured in this way. The names of thesekit variables (excluding theirkit prefix, leaving numbers 1, 2, 3, 4) become data in the variable namedkit_no. The values/ data from within thesekit variables ends up in variable namedweight.

The last optional line dropped any rows where weight=NA.

# restructure "kittens" to longer version, save as "kits"
kits <-   pivot_longer(
    kittens, # name the data-frame to be restructured to long format
    cols = starts_with("kit"), # name the cols to be restructured
    names_to = "kit_no",     # data from variable names starting kit ends up here 
                        # (this then contains numbers on end of names, stripping out prefix below)
    names_prefix = "kit", # gives prefix for var names to be restructured 
                         # this implies "kit" part of variable names is not included in "kit_no" data
    values_to = "weight", # name the variable that will be created to contain 
                          # data from above named columns
        values_drop_na = TRUE # OPTIONALLY drop new rows with NA for newly created 
                          # weight variable 
  )

Here is the long format again:

print(kits)

Show entries

Search:

	catlitterno	catmum_name	kit_no	weight
1	1	Ginger	1	97
2	1	Ginger	2	98
3	1	Ginger	3	104
4	1	Ginger	4	107
5	2	Snowy	1	84
6	2	Snowy	2	87
7	2	Snowy	3	101
8	3	Emily	1	110
9	3	Emily	2	113
10	3	Emily	3	132

Showing 1 to 10 of 10 entries

Previous1Next

It is possible to restructure several variables to long format at the same time; an example of this is shown in the next section. Similarly, restructuring wide can cope with 2 or more variables to restructure.

The following version of pivot_wider shows just how easy it is to extend the syntax to cope with restructuring several variables simultaneously (all those named invalues_from option).

head(phq3)
phq4 <- pivot_wider(phq3, # name dataframe to be restructured
                    values_from=c("phq", "time_since_baseline",  # name set of cols to be restructured
                                  "phq_change_from_baseline",
                                  "phq_change_from_previous_time",
                                  "time_from_previous", "datev"), 
                    names_from="nnn") # numbers from this column become latter part of variable names
head(phq4)

However, pivot_longer is slightly more complex when more than one set of columns is restructured, as shown in the next section.

13.2 Requirements for `pivot_longer`

We need a cluster ID variable (perhaps family ID/ region ID/ person ID for repeated measures data). This needs to be unique, so different on each row (with data in wide format).

The above format of the command requires variable names which all start the same but have a different ending (usually different numbers). In the above example, the names were kit1, kit2, kit3, kit4.

When we have 2 or more sets of variables to restructure, we need a way to separate these variable names (for these sets of variables) into a component that become variable names in the “long” version of the dataset, and a component that forms data in a new column. The following form of the command works, for variables named birthweight1, birthweight2, birthweight3, birthweight4 and then name1, name2, name3, name4, name5. Any other set of variables could be added, with variable names of the forms ###1, ###2, ###3, ###4, ###5 where ### is one or more letters.

print(pups, font_size = 7)

Show entries

Search:

	doglitterno	birthweight1	birthweight2	birthweight3	birthweight4	birthweight5	name1	name2	name3	name4	name5
1	1	210	190	203	184		Fred	Gulliver	Brian	Sandy
2	2	150	143	199			Patch	Spotty	Lightning
3	3	340	360	370	296	312	Fido	Greg	Archie	Seamus	Milo

Showing 1 to 3 of 3 entries

Previous1Next

The syntax names the columns to be restructured, with the cols=starts_with option: cols=starts_with( c(“birthweight”, “name”) )

The above names 2 sets of variables to restructure. More names can be added to restructure more variables.

Whilst the names_to & names_sep are both required, it should be sufficient to copy them from that shown below, provided each set of variable names to restructure is of the form ###1, ###2, ###3, ###4,… where ### is any letter(s).

# Restructuring to long format with 2 groups of variables to restructure
# where variable names have format "name_stub" then number (where number
# denotes member of group)
pups2 <- pivot_longer(
    pups,
    cols=starts_with(c("birthweight", "name")), # define variables to be restructured
    names_to = c(".value", "pupno"), # 2nd element of var names become contents of var "pupno"
    #".value" above implies that column names comes from info in first part of var names as defined below:
           names_sep = "(?<=[A-Za-z])(?=[0-9])" # separate var names at first point a letter is followed by a number
    # names_sep is applied to variable names from cols= element
    # because we specified ".value" in names_to command we don't need to specify "values_to"
)

print(pups2)

Show entries

Search:

	doglitterno	pupno	birthweight	name
1	1	1	210	Fred
2	1	2	190	Gulliver
3	1	3	203	Brian
4	1	4	184	Sandy
5	1	5
6	2	1	150	Patch
7	2	2	143	Spotty
8	2	3	199	Lightning
9	2	4
10	2	5

Showing 1 to 10 of 15 entries

Previous1 2Next

For instance, with an additional set of columns named wt1, wt2, wt3, wt4, wt5 to restructure, the syntax is identical, except for the addition of wt in the cols=starts_with option. Therefore, with names of this format, it is easy to extend the syntax to cope with very many sets of variables to restructure.

print(pups, font_size=5)

Show entries

Search:

	doglitterno	birthweight1	birthweight2	birthweight3	birthweight4	birthweight5	name1	name2	name3	name4	name5	wt1	wt2	wt3	wt4	wt5
1	1	210	190	203	184		Fred	Gulliver	Brian	Sandy		32	43	1	1	1
2	2	150	143	199			Patch	Spotty	Lightning			34	23	2	2	2
3	3	340	360	370	296	312	Fido	Greg	Archie	Seamus	Milo	23	12	3	3	3

Showing 1 to 3 of 3 entries

Previous1Next

# The command is the same as the previous block, except all columns that also start "wt" are transformed from wide to long format
pups2 <- pivot_longer(
    pups,
    cols=starts_with(c("birthweight", "name", "wt")),
    names_to = c(".value", "pupno"),
    names_sep = "(?<=[A-Za-z])(?=[0-9])" )

print(pups2)

Show entries

Search:

	doglitterno	pupno	birthweight	name	wt
1	1	1	210	Fred	32
2	1	2	190	Gulliver	43
3	1	3	203	Brian	1
4	1	4	184	Sandy	1
5	1	5			1
6	2	1	150	Patch	34
7	2	2	143	Spotty	23
8	2	3	199	Lightning	2
9	2	4			2
10	2	5			2

Showing 1 to 10 of 15 entries

Previous1 2Next

Just in case it is useful, this is an example of output that is not fully restructured, so 5 sets of the wt variable remain. This is unlikely to be useful:

# The command is the same as the previous block, except all columns that also start "wt" are transformed from wide to long format
pups2 <- pivot_longer(
    pups,
    cols=starts_with(c("birthweight", "name")),
    names_to = c(".value", "pupno"),
    names_sep = "(?<=[A-Za-z])(?=[0-9])" )

print(pups2)

Show entries

Search:

	doglitterno	wt1	wt2	wt3	wt4	wt5	pupno	birthweight	name
1	1	32	43	1	1	1	1	210	Fred
2	1	32	43	1	1	1	2	190	Gulliver
3	1	32	43	1	1	1	3	203	Brian
4	1	32	43	1	1	1	4	184	Sandy
5	1	32	43	1	1	1	5
6	2	34	23	2	2	2	1	150	Patch
7	2	34	23	2	2	2	2	143	Spotty
8	2	34	23	2	2	2	3	199	Lightning
9	2	34	23	2	2	2	4
10	2	34	23	2	2	2	5

Showing 1 to 10 of 15 entries

Previous1 2Next

13.3 Requirements for `pivot_wider`

You need a variable which uniquely defines the clusters, which could be family, region, hospital or (for repeated measures data) personid. The above example had kittens clustered within cat family/ litter. Either of the variables catlitterno or catmum_name or both could be selected, although catlitterno is the obvious choice (as used above).
We need another variable (herekitno), which counts people/ items/ repeat visits within clusters. It usually takes usually a small number of integer values, although we could theoretically have 50 items within one cluster (requiring items numbers 1, 2, 3….50). The number of separate values it takes (1, 2, 3, 4 = 4 values in the kittens example above) corresponds to the number of new variables created (per variable to be restructured). Because there is one restructured variable (weight) in the kitten example, there are 4 new variables created (one for each value of kitno, called kit1, kit2, kit3, kit4). You might need to create this – see next section.
All variables that vary within cluster (catlitterno here; patientid for repeated measures data) need to be either:
- Dropped prior to pivot_wider
- OR specified in the pivot_wider command

EXAMPLE OF pivot_wider NOT WORKING violating 3. above. Resulting dataset is wider but is just as long. This is rarely of any use. For example, if the above kit dataset contained values for the length as well as weight of each kitten, the identical pivot_wider command to above results in this useless format:

print(kitten)

Show entries

Search:

	catlitterno	catmum_name	length	kit1	kit2	kit3	kit4
1	1	Ginger	17	97
2	1	Ginger	19		98		107
3	1	Ginger	20			104
4	2	Snowy	16	84
5	2	Snowy	15		87
6	2	Snowy	20			101
7	3	Emily	21	110	113
8	3	Emily	22			132

Showing 1 to 8 of 8 entries

Previous1Next

If pivot_wider does not result in a shorter dataset, we need to either drop variable(s) prior to pivot_wider (such as length variable here): kits$length <- NULL or else specify all variables that vary by cluster as needing restructuring; the command becomes a little more complex.

13.3.1 Creating a count within cluster to prepare for `pivot_wider`

Suppose you have patientid, then a date variable to represent different visits, but no visit number variables. You need to initially create a visitno variable, before you can successfully pivot_wider. Otherwise, you end up with multiple versions of each restructured variable, one for each different date (from your date variable – all dates might possibly be different.

The code below usesarrange from tidyverse dplyr to sort the data. In the example, it is sorted by hserial (=household ID number, a form a cluster ID), then by pserial (=person ID number, where there are a few lines for each person within clusters/households).

Then the data is grouped by the cluster variable, using dpylr’s group_by( ) command. Use library(tidyverse) or library(dplyr). Then a new command is created, and since this is relying on dpylr’s grouping, we need to use dpylr’s mutate to create the new variable. The code below specifies the namepersonno for this new variable, which is a sequence from 1, going up in steps of 1 (by hserial, the group_by variable), until it reaches the total number (in that hserial cluster).

ihd_pain3 <- read.csv("ihd_pain3.csv")

ihd_pain3 # data on people (id pserial) within households (id hserial)

# The above does not have a variable numbering data within groups
ihd_pain4 <- arrange(ihd_pain3, hserial, pserial) # sort by hserial then pserial

ihd_pain4 <- group_by(ihd_pain4, hserial)  # group by hserial

# Note text "Group: hserial[8]" denoting 8 groups

# NOTE: mutate is Tidyverse command to create new var - must be used for n() to work
# create new var personno var that counts people within household
ihd_pain4 <- mutate(ihd_pain4, personno=seq(from=1,by=1, to=n())) 

print(ihd_pain4)

To help you understand the code above:

seq(from=1, by=1, to=9) gives a sequence from 1 to 9 in steps of 1, hence it results in: 1,2,3,4,5,6,7,8,9.

seq(from=1, by=1, to=n()) results in 1,2,3,4,5,6,7,…… n() where n() is the total in the cluster/ group_by variable.

If you are not familiar with “pipe %>%” format of commands, the above code is useful. Many people would write the above codein the following pipe format. It can be trickier to debug pipes, since you might not be certain where the error lies. To debug pipes, you can add in one line at a time to make sure each works (or remove lines one at a time, until you find code that runs).

ihd_pain4 <- ihd_pain3 %>% 
    arrange(hserial, pserial) %>% 
    group_by(hserial) %>% 
    mutate(personno=seq(from=1, by=1, to=n()))

The above syntax will define new variable visitno to equal 1, 2, 3, 4, 5…, starting at 1 again for each new pserial (personid).

For this specific example, several clusters contain only one person, so personno (which counts people within clusters) takes the value 1. There is just one hserial (household) which has 3 people in it (3 pserial’s) in this example 1022021 – with personno 1, 2 and 3 against it.

print(ihd_pain4)

Show entries

Search:

	X	hserial	pserial	male	date	personno
1	1	1001011	100101101	1	2020-06-12	1
2	2	1001031	100103101	1	2018-06-04	1
3	3	1001041	100104101	0		1
4	4	1001041	100104102	1	2019-06-12	2
5	5	1001051	100105101	1	2010-04-29	1
6	6	1001061	100106101	1	2002-06-30	1
7	7	1001151	100115102	1	2009-03-12	1
8	8	1002021	100202101	1	1996-06-04	1
9	9	1002021	100202102	0	1997-05-14	2
10	10	1002021	100202103	0	1996-11-23	3

Showing 1 to 10 of 12 entries

Previous1 2Next

13.4 Creating change variables for repeated measures data

Here is an example of creating changes from baseline and changes from previous measurement. This works on data in long format. This uses syntax from tidyverse’s dplyr so requires library(dplyr) or library(tidyverse).

# this enables dmy function to create var that R recognises as a date
require(lubridate)

# Creating variables that give differences from baseline and from last
# measurement for both dates and numeric data. This is easiest when data is in
# long format

# The following commands convert the string "date_visit" variable into date format
# then group by personid, and sort by personid and date
# Then creates a sequence that counts 1, 2, 3... within each personid

# keep only rows with non-missing personid
phq3 <- subset(phq, is.na(personid)==FALSE) 

# create date var from string "date" var "date_visit", with date in order date,
# month, year
phq3$datev <- dmy(phq3$date_visit)

# sort dataframe phq by personid and then datev variables
phq3 <- arrange(phq3, personid, datev)

# group by personid
phq3 <- group_by(phq3, personid) 

# DROP this variable - note negative - sign before its name
phq3 <- select(phq3, -date_visit) 

phq3<- mutate(phq3,
              # count visitno within group_by variable
              nnn=seq(from=1,by=1, to=n()),
              # first is first value (within group_by var)
              baseline_phq=first(phq), 
              # find minimum date (within group_by var)
              baseline_date=min((datev), na.rm=TRUE), 
              # find diff from baseline date
              time_since_baseline=(datev - baseline_date),
              # find diff in phq from baseline
              phq_change_from_baseline=phq-baseline_phq, 
              # lag(phq) gives the value of phq on row above (provided it has 
              # the same personid = same group_by variable) it results in 
              # missing value for the first row within each personid 
              # (our group_by variable)
              # ifelse fills in zero diff at baseline rather than missing
              phq_change_from_previous_time=ifelse(nnn==1,0,phq - lag(phq)),
              # similarly for date
              time_from_previous=ifelse(nnn==1, 0, datev - lag(datev)) 
              ) 

print(phq3, font_size=5)

Show entries

Search:

	personid	age_baseline	phq	ethnic	datev	nnn	baseline_phq	baseline_date	time_since_baseline	phq_change_from_baseline	phq_change_from_previous_time	time_from_previous
1	44	40	12	Other	2016-01-19	1	12	2016-01-19	0	0	0	0
2	44	40	15	Other	2016-07-06	2	12	2016-01-19	169	3	3	169
3	44	40	5	Other	2016-12-06	3	12	2016-01-19	322	-7	-10	153
4	44	40	12	Other	2017-07-12	4	12	2016-01-19	540	0	7	218
5	108	30	2	Asian	2015-09-10	1	2	2015-09-10	0	0	0	0
6	108	30	5	Asian	2016-01-27	2	2	2015-09-10	139	3	3	139
7	108	30	12	Asian	2016-08-02	3	2	2015-09-10	327	10	7	188
8	108	30	15	Asian	2017-01-16	4	2	2015-09-10	494	13	3	167
9	424	39	4	White	2015-06-10	1	4	2015-06-10	0	0	0	0
10	424	39	5	White	2015-11-05	2	4	2015-06-10	148	1	1	148

Showing 1 to 10 of 14 entries

Previous1 2Next

# Restructuring to wide format with several groups of variables to restructure
# Note that it is important to specify all variables that vary by cluster so 
# that the following code does not produce data with lots of NAs with more lines
# than desired alternatively, variables that are not needed can be dropped 
# before this transformation
phq4 <- pivot_wider(phq3, 
                    values_from=c("phq", "time_since_baseline",
                                  "phq_change_from_baseline",
                                  "phq_change_from_previous_time",
                                  "time_from_previous", "datev"  ),
                    names_from="nnn"
  )

print(phq4, font_size=3)

Show entries

Search:

	personid	age_baseline	ethnic	baseline_phq	baseline_date	phq_1	phq_2	phq_3	phq_4	time_since_baseline_1	time_since_baseline_2	time_since_baseline_3	time_since_baseline_4	phq_change_from_baseline_1	phq_change_from_baseline_2	phq_change_from_baseline_3	phq_change_from_baseline_4	phq_change_from_previous_time_1	phq_change_from_previous_time_2	phq_change_from_previous_time_3	phq_change_from_previous_time_4	time_from_previous_1	time_from_previous_2	time_from_previous_3	time_from_previous_4	datev_1	datev_2	datev_3	datev_4

	personid	age_baseline	ethnic	baseline_phq	baseline_date	phq_1	phq_2	phq_3	phq_4	time_since_baseline_2	time_since_baseline_3	time_since_baseline_4	phq_change_from_baseline_2	phq_change_from_baseline_3	phq_change_from_baseline_4	phq_change_from_previous_time_2	phq_change_from_previous_time_3	phq_change_from_previous_time_4	time_from_previous_2	time_from_previous_3	time_from_previous_4	datev_1	datev_2	datev_3	datev_4
1	44	40	Other	12	2016-01-19	12	15	5	12	169	322	540	3	-7	0	3	-10	7	169	153	218	2016-01-19	2016-07-06	2016-12-06	2017-07-12
2	108	30	Asian	2	2015-09-10	2	5	12	15	139	327	494	3	10	13	3	7	3	139	188	167	2015-09-10	2016-01-27	2016-08-02	2017-01-16
3	424	39	White	4	2015-06-10	4	5			148			1			1			148			2015-06-10	2015-11-05
4	521	33	Mixed	6	2015-05-06	6	2			136			-4			-4			136			2015-05-06	2015-09-19
5	555	34	Other	5	2014-12-22	5	17			193			12			12			193			2014-12-22	2015-07-03

Showing 1 to 5 of 5 entries

Previous1Next

This can be converted back to long format as follows:

# Restructuring back to long format
phq5 <- pivot_longer(phq4, 
              cols=starts_with(c("phq", "time_since_baseline",
                                 "phq_change_from_baseline",
                                 "phq_change_from_previous",
                                 "time_from_previous", "datev")),
               # take latter part of names and put into var="visitno"
               # ".value" above implies that column names comes from info in
               names_to = c(".value", "visitno"), 
               # first part of var names as defined below:
               names_sep = "(?<=[A-Za-z])_(?=[0-9])" # separate var names 
               # separate where there is a letter [A-Za-z] followed by "_" 
               # followed by a number [0-9] 
               # because we specified ".value" in names_to command we don't need
               # to specify "values_to"
  )

print(phq5, font_size=5)

Show entries

Search:

	personid	age_baseline	ethnic	baseline_phq	baseline_date	visitno	phq	phq_change_from_baseline	phq_change_from_previous_time	time_since_baseline	time_from_previous	datev
1	44	40	Other	12	2016-01-19	1	12	0	0	0	0	2016-01-19
2	44	40	Other	12	2016-01-19	2	15	3	3	169	169	2016-07-06
3	44	40	Other	12	2016-01-19	3	5	-7	-10	322	153	2016-12-06
4	44	40	Other	12	2016-01-19	4	12	0	7	540	218	2017-07-12
5	108	30	Asian	2	2015-09-10	1	2	0	0	0	0	2015-09-10
6	108	30	Asian	2	2015-09-10	2	5	3	3	139	139	2016-01-27
7	108	30	Asian	2	2015-09-10	3	12	10	7	327	188	2016-08-02
8	108	30	Asian	2	2015-09-10	4	15	13	3	494	167	2017-01-16
9	424	39	White	4	2015-06-10	1	4	0	0	0	0	2015-06-10
10	424	39	White	4	2015-06-10	2	5	1	1	148	148	2015-11-05

Showing 1 to 10 of 20 entries

Previous1 2Next

13.5 Exercises

Open puppies.csv, drop the names. Now reshape so that each puppy has its own line of data showing its birth-weight.
Reopen the dataset puppies.csv, and this time keep the names. Now reshape so that each puppy has its own line of data, showing its birth-weight and name.
Reshape kittens.csv, so that each kitten has its own line of data with its own birth-weight. Hint: you need to rename some variables first. Now use the reshape command again (changing “long” to “wide” in the command) to get back to the original data structure. Notice how the coding does and does not relate to variable names, in each version of the command.
Reshape kittens_puppies.csv, so that each baby (kitten or puppy) has its birth-weight on its own line.
Reshape husband_anx_deprn.csv so that the baseline and follow-up PHQ scores both appear on the same line.
Reshape wife_deprn.csv so that the baseline and follow-up PHQ scores both appear on the same line. Hint: you will need to firstly create a variable that contains the visit number, using _n.
Reshape couple_deprn_anxiety2.csv dataset, so that the baseline and the follow-up visits occur on separate lines. Reshape again, so that data for husbands and wives also occur on separate lines. (wife/baseline, wife/follow-up, husband/baseline and husband/follow-up, all on separate rows for each family).
Reshape the ihddata.csv into wide format.

Datasets are available here: https://github.com/hcwatt/data_wrangling_open.

Data Wrangling Recipes in R: Hilary Watt. PCPH, Imperial College London.

13 Reshaping/ restructuring repeated measures/ clustered datasets

13.1 Long and wide data formats and basic commands

13.2 Requirements for pivot_longer

13.3 Requirements for pivot_wider

13.3.1 Creating a count within cluster to prepare for pivot_wider

13.4 Creating change variables for repeated measures data

13.5 Exercises

13.2 Requirements for `pivot_longer`

13.3 Requirements for `pivot_wider`

13.3.1 Creating a count within cluster to prepare for `pivot_wider`