Chapter 11 Plotting and data visualization

11.1 Dataframe setup for visualization

In this lesson we want to make plots to evaluate the average expression in each sample and its relationship with the age of the mouse. So, to this end, we will be adding a couple of additional columns of information to the metadata data frame that we can utilize for plotting.

Let’s first load the data:

### Reading data into R
metadata <- read.csv(file="data/mouse_exp_design.csv")

Alt text

11.1.1 Calculating average expression

Let’s take a closer look at our counts data (rpkm_ordered). Each column represents a sample in our experiment, and each sample has > 36,000 total counts. We want to compute the average value of expression for each sample. Taking this one step at a time, if we just wanted the average expression for Sample 1 we can use the R base function mean():

mean(rpkm_ordered$sample1)
## [1] 10.2661

That is great, but we need to get this information from all 12 samples, so all 12 columns. We want a vector of 12 values that we can add to the metadata data frame. What is the best way to do this?

To get the mean of all the samples in a single line of code the map() family of function is a good option.

11.1.2 The map family of functions

The map() family of functions is available from the purrr package, which is part of the tidyverse suite of packages. We can map() functions to execute some task/function on every element in a vector, or every column in a dataframe, or every component of a list, and so on.

  • map() creates a list.
  • map_lgl() creates a logical vector.
  • map_int() creates an integer vector.
  • map_dbl() creates a “double” or numeric vector.
  • map_chr() creates a character vector.

The syntax for the map() family of functions is:

## DO NOT RUN
map(object, function_to_apply)

To obtain mean values for all samples we can use the map_dbl() function which generates a numeric vector.

library(purrr)  # Load the purrr

samplemeans <- map_dbl(rpkm_ordered, mean) 

The output of map_dbl() is a named vector of length 12.

11.1.3 Adding data to metadata

Before we add samplemeans as a new column to metadata, let’s create a vector with the ages of each of the mice in our data set.

# Create a numeric vector with ages. Note that there are 12 elements here

age_in_days <- c(40, 32, 38, 35, 41, 32, 34, 26, 28, 28, 30, 32)        

Now, we are ready to combine the metadata data frame with the 2 new vectors to create a new data frame with 5 columns:

# Add the new vectors as the last columns to the metadata 

new_metadata <- data.frame(metadata, samplemeans, age_in_days) 

# Take a look at the new_metadata object
head(new_metadata)
##         genotype celltype replicate samplemeans age_in_days
## sample1       Wt    typeA         1   10.266102          40
## sample2       Wt    typeA         2   10.849759          32
## sample3       Wt    typeA         3    9.452517          38
## sample4       KO    typeA         1   15.833872          35
## sample5       KO    typeA         2   15.590184          41
## sample6       KO    typeA         3   15.551529          32

Finally, save the new_metadata as a .RData object.

save.image(file="data/new_metadata.RData")

Using new_metadata, we are now ready for plotting and data visualization.