3.3 Data Manipulation
Now that we have the meta-analysis data in RStudio, let us do a few manipulations with the data. These functions might come in handy when we are conducting analyses later on.
Going back to the output of the
str() function, we see that this also gives us details on the type of data we have stored in each column of our dataset. There are different abbreviations signifying different types of data.
|num||Numerical||This is all data stored as numbers (e.g. 1.02).|
|chr||Character||This is all data stored as words.|
|log||Logical||These are variables which are binary, meaning that they signify that a condition is either TRUE or FALSE.|
|factor||Factor||Factors are stored as numbers, with each number signifying a different level of a variable. A possible factor of a variable might be 1 = low, 2 = medium, 3 = high.|
3.3.1 Converting to factors
Let’s say we have the subgroup Risk of Bias (in which the Risk of Bias rating is coded), and want it to be a factor with two different levels: “low” and “high”.
To do this, we need the variable
ROB to be a factor. However, this variable is currently stored as a character (
chr). We can have a look at this variable by typing the name of our dataset, then adding the selector
$, and then adding the variable we want to have a look at.
##  "high" "low" "high" "low" "low" "low" "high" "low" "low" "low" ##  "high" "low" "low" "low" "high" "high" "high" "low"
## chr [1:18] "high" "low" "high" "low" "low" "low" "high" "low" "low" ...
We can see now that
ROB is indeed a character type variable, which contains only two words: “low” and “high”. We want to convert this to a factor variable now, which has only two levels, low and high. To do this, we use the
##  high low high low low low high low low low high low low low ##  high high high low ## Levels: high low
## Factor w/ 2 levels "high","low": 1 2 1 2 2 2 1 2 2 2 ...
We now see that the variable has been converted to a factor with the levels “high” and “low”.
3.3.2 Converting to logicals
Now let us have a look at the intervention type subgroup variable. This column is currently stored as a character (
chr) variable too.
##  "mindfulness" "mindfulness" "ACT" "mindfulness" "PCI" ##  "ACT" "mindfulness" "mindfulness" "PCI" "mindfulness" ##  "mindfulness" "mindfulness" "mindfulness" "ACT" "mindfulness" ##  "mindfulness" "mindfulness" "mindfulness"
## chr [1:18] "mindfulness" "mindfulness" "ACT" "mindfulness" "PCI" ...
Let us say we want a variable which only contains information if a study is a mindfulness intervention or not. A logical is very well suited for this. To convert the data to logical, we use the
as.logical function. We will create a new variable containing this information called
intervention.type.logical. To tell R what to count as
TRUE and what as
FALSE, we have to define the specific intervention type using the
intervention.type.logical<-as.logical(madata$`intervention type`=="mindfulness") intervention.type.logical
##  TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE FALSE TRUE TRUE ##  TRUE TRUE FALSE TRUE TRUE TRUE TRUE
We see that R has converted the character information into trues and falses for us. To check if this was done correctly, let us compare the original and the new variable.
n <- data.frame(intervention.type.logical,madata$`intervention type`) names <- c("New", "Original") colnames(n) <- names kable(n)
3.3.3 Selecting specific studies
It may often come in handy to select certain studies for further analyses, or to exclude some studies in further analyses (e.g., if they are outliers). To do this, we can use the
filter function in the
dplyrpackage, which is part of the
tidyverse package we installed before.
So, let us load the package first.
Let us say we want to do a meta-analysis with three studies in our dataset only. To do this, we need to create a new dataset containing only these studies using the
dplyr::filter() function. The
dplyr:: part is necessary as there is more than one
filter function in R, and we want to use to use the one of the
dplyr package. Let us say we want to have the studies by Cavanagh et al., Frazier et al. and Phang et al. stored in another dataset, so we can conduct analyses only for these studies.
The R code to store these three studies in a new dataset called
madata.new looks like this:
madata.new <- dplyr::filter(madata, Author %in% c("Cavanagh et al.", "Frazier et al.", "Phang et al."))
Note that the
%in% command tells the
filter function to search for exactly the three cases we defined in the variable
Now, let us have a look at the new data
madata.new we just created.
|Author||TE||seTE||RoB||Control||intervention duration||intervention type||population||type of students||prevention type||gender||mode of delivery||ROB streng||ROB superstreng||compensation||instruments||guidance||ROB|
|Cavanagh et al.||0.3548641||0.1963624||low||WLC||short||mindfulness||students||general||universal||mixed||online||low||high||none||PSS||self-guided||low|
|Frazier et al.||0.4218509||0.1448128||low||information only||short||PCI||students||psychology||universal||mixed||online||low||low||credit||PSS||reminders||low|
|Phang et al.||0.5407398||0.2443133||low||no intervention||short||mindfulness||students||medical studens||selective||mixed||group||low||low||none||PSS||f2f||low|
Note that the function can also be used for any other type of data and variable. We can also use it to, for example, only select studies which were coded as being a mindfulness study.
madata.new.mf <- dplyr::filter(madata,`intervention type` %in% c("mindfulness"))
We can also use the
dplyr::filter() function to exclude studies from our dataset. To do this, we only have to add
! in front of the variable we want to use for filtering.
madata.new.excl <- dplyr::filter(madata,!Author %in% c("Cavanagh et al.", "Frazier et al.", "Phang et al."))
3.3.4 Changing cell values
Sometimes, even when preparing your data in Excel, you might want to change values in RStudio once you have imported your data. To do this, we have to select a cell in our data frame in RStudio. This can be done by adding
[x,y] to our dataset name, where x signifies the number of the row we want to select, and y signifies the number of the column.
To see how this works, let us select a variable using this command first:
We now see the 6th study in our dataframe, and the value of this study for Column 1 (the author name) is displayed. Let us say we had a typo in this name and want to have it changed. In this case, we have to give this exact cell a new value.
madata[6,1] <- "Frogelli et al."
Let us check if the name has changed.
You can also use this function to change any other type of data, including numericals and logicals. Only for characters, you have to put the values you want to insert in