3.3 Data Manipulation

Now that we have the meta-analysis data in RStudio, let us do a few manipulations with the data. These functions might come in handy when we are conducting analyses later on.

Going back to the output of the str() function, we see that this also gives us details on the type of data we have stored in each column of our dataset. There are different abbreviations signifying different types of data.

Abbreviation Type Description
num Numerical This is all data stored as numbers (e.g. 1.02).
chr Character This is all data stored as words.
log Logical These are variables which are binary, meaning that they signify that a condition is either TRUE or FALSE.
factor Factor Factors are stored as numbers, with each number signifying a different level of a variable. A possible factor of a variable might be 1 = low, 2 = medium, 3 = high.

3.3.1 Converting to factors

Let’s say we have the subgroup Risk of Bias (in which the Risk of Bias rating is coded), and want it to be a factor with two different levels: “low” and “high”.

To do this, we need the variable ROB to be a factor. However, this variable is currently stored as a character (chr). We can have a look at this variable by typing the name of our dataset, then adding the selector $, and then adding the variable we want to have a look at.

##  [1] "high" "low"  "high" "low"  "low"  "low"  "high" "low"  "low"  "low" 
## [11] "high" "low"  "low"  "low"  "high" "high" "high" "low"
##  chr [1:18] "high" "low" "high" "low" "low" "low" "high" "low" "low" ...

We can see now that ROB is indeed a character type variable, which contains only two words: “low” and “high”. We want to convert this to a factor variable now, which has only two levels, low and high. To do this, we use the factor() function.

##  [1] high low  high low  low  low  high low  low  low  high low  low  low 
## [15] high high high low 
## Levels: high low
##  Factor w/ 2 levels "high","low": 1 2 1 2 2 2 1 2 2 2 ...

We now see that the variable has been converted to a factor with the levels “high” and “low”.

3.3.2 Converting to logicals

Now let us have a look at the intervention type subgroup variable. This column is currently stored as a character (chr) variable too.

madata$`intervention type`
##  [1] "mindfulness" "mindfulness" "ACT"         "mindfulness" "PCI"        
##  [6] "ACT"         "mindfulness" "mindfulness" "PCI"         "mindfulness"
## [11] "mindfulness" "mindfulness" "mindfulness" "ACT"         "mindfulness"
## [16] "mindfulness" "mindfulness" "mindfulness"
str(madata$`intervention type`)
##  chr [1:18] "mindfulness" "mindfulness" "ACT" "mindfulness" "PCI" ...

Let us say we want a variable which only contains information if a study is a mindfulness intervention or not. A logical is very well suited for this. To convert the data to logical, we use the as.logical function. We will create a new variable containing this information called intervention.type.logical. To tell R what to count as TRUE and what as FALSE, we have to define the specific intervention type using the == command.

intervention.type.logical<-as.logical(madata$`intervention type`=="mindfulness")

We see that R has converted the character information into trues and falses for us. To check if this was done correctly, let us compare the original and the new variable.

n <- data.frame(intervention.type.logical,madata$`intervention type`)
names <- c("New", "Original")
colnames(n) <- names
New Original
TRUE mindfulness
TRUE mindfulness
TRUE mindfulness
TRUE mindfulness
TRUE mindfulness
TRUE mindfulness
TRUE mindfulness
TRUE mindfulness
TRUE mindfulness
TRUE mindfulness
TRUE mindfulness
TRUE mindfulness
TRUE mindfulness

3.3.3 Selecting specific studies

It may often come in handy to select certain studies for further analyses, or to exclude some studies in further analyses (e.g., if they are outliers). To do this, we can use the filter function in the dplyrpackage, which is part of the tidyverse package we installed before.

So, let us load the package first.


Let us say we want to do a meta-analysis with three studies in our dataset only. To do this, we need to create a new dataset containing only these studies using the dplyr::filter() function. The dplyr:: part is necessary as there is more than one filter function in R, and we want to use to use the one of the dplyr package. Let us say we want to have the studies by Cavanagh et al., Frazier et al. and Phang et al. stored in another dataset, so we can conduct analyses only for these studies.

The R code to store these three studies in a new dataset called madata.new looks like this:

madata.new <- dplyr::filter(madata, Author %in% c("Cavanagh et al.",
                                                  "Frazier et al.",
                                                  "Phang et al."))

Note that the %in% command tells the filter function to search for exactly the three cases we defined in the variable Author.

Now, let us have a look at the new data madata.new we just created.

Author TE seTE RoB Control intervention duration intervention type population type of students prevention type gender mode of delivery ROB streng ROB superstreng compensation instruments guidance ROB
Cavanagh et al. 0.3548641 0.1963624 low WLC short mindfulness students general universal mixed online low high none PSS self-guided low
Frazier et al. 0.4218509 0.1448128 low information only short PCI students psychology universal mixed online low low credit PSS reminders low
Phang et al. 0.5407398 0.2443133 low no intervention short mindfulness students medical studens selective mixed group low low none PSS f2f low

Note that the function can also be used for any other type of data and variable. We can also use it to, for example, only select studies which were coded as being a mindfulness study.

madata.new.mf <- dplyr::filter(madata,`intervention type` %in% c("mindfulness"))

We can also use the dplyr::filter() function to exclude studies from our dataset. To do this, we only have to add ! in front of the variable we want to use for filtering.

madata.new.excl <- dplyr::filter(madata,!Author %in% c("Cavanagh et al.",
                                                     "Frazier et al.",
                                                     "Phang et al."))

3.3.4 Changing cell values

Sometimes, even when preparing your data in Excel, you might want to change values in RStudio once you have imported your data. To do this, we have to select a cell in our data frame in RStudio. This can be done by adding [x,y] to our dataset name, where x signifies the number of the row we want to select, and y signifies the number of the column.

To see how this works, let us select a variable using this command first:


We now see the 6th study in our dataframe, and the value of this study for Column 1 (the author name) is displayed. Let us say we had a typo in this name and want to have it changed. In this case, we have to give this exact cell a new value.

madata[6,1] <- "Frogelli et al."

Let us check if the name has changed.


You can also use this function to change any other type of data, including numericals and logicals. Only for characters, you have to put the values you want to insert in "".