Chapter 7 Summary of DGE workflow

We have detailed the various steps in a differential expression analysis workflow, providing theory with example code. To provide a more succinct reference for the code needed to run a DGE analysis, we have summarized the steps in an analysis below:

  1. Count normalization:
# Check that the row names of the metadata equal the column names of the **raw
# counts** data

all(colnames(raw_counts) == rownames(metadata))

# Create DESeq2Dataset object
dds <- DESeqDataSetFromMatrix(countData = raw_counts, colData = metadata, design = ~condition)
  1. Exploratory data analysis (PCA & heirarchical clustering) - identifying outliers and sources of variation in the data:
# Transform counts for data visualization
rld <- rlog(dds, blind = TRUE)

# Plot PCA
plotPCA(rld, intgroup = "condition")

# Extract the rlog matrix from the object
rld_mat <- assay(rld)

# Compute pairwise correlation values
rld_cor <- cor(rld_mat)

# Plot heatmap
pheatmap(rld_cor)
  1. Run DESeq2:
# **Optional step** - Re-create DESeq2 dataset if the design formula has
# changed after QC analysis in include other sources of variation

dds <- DESeqDataSetFromMatrix(countData = raw_counts, colData = metadata, design = ~condition)

# Run DESeq2 differential expression analysis
dds <- DESeq(dds)

# **Optional step** - Output normalized counts to save as a file to access
# outside RStudio

normalized_counts <- counts(dds, normalized = TRUE)
  1. Check the fit of the dispersion estimates:
# Plot dispersion estimates
plotDispEsts(dds)
  1. Create contrasts to perform Wald testing on the shrunken log2 foldchanges between specific conditions:
# Output results of Wald test for contrast
contrast <- c("condition", "level_to_compare", "base_level")

res <- results(dds, contrast = contrast)

coef = resultsNames(dds)

res_tableOE <- lfcShrink(dds, coef = coef[2], res = res, type = "apeglm")
  1. Output significant results:
# Turn the results object into a data frame
res_df <- res %>%
    data.frame() %>%
    rownames_to_column(var = "gene")

# Subset the significant results
sig_res <- filter(res_df, padj < padj.cutoff & abs(log2FoldChange) > lfc.cutoff)
  1. Visualize results: volcano plots, heatmaps, normalized counts plots of top genes, etc.

  2. Make sure to output the versions of all tools used in the DE analysis:

sessionInfo()