Chapter 7 Summary of DGE workflow

Homework: modify this file to analyze the MOV dataset, starting with Mov10_full_counts.txt in your data folder. Compare the “MOV10_knockdown” to the “control”. Include a heatmap and a volcano plot points = +10

## Setup
library(DESeq2)

We have detailed the various steps in a differential expression analysis workflow, providing theory with example code. To provide a more succinct reference for the code needed to run a DGE analysis, we have summarized the steps in an analysis below:

7.0.1 1. Import data into dds object:

# Check that the row names of the metadata equal the column names
#  of the **raw counts** data
all(colnames(raw_counts) == rownames(metadata))

# Create DESeq2Dataset object
dds <- DESeqDataSetFromMatrix(countData = raw_counts, colData = metadata, 
                              design = ~ condition)

7.0.2 2. Exploratory data analysis (PCA & heirarchical clustering) - identifying outliers and sources of variation in the data:

# Transform counts for data visualization
rld <- rlog(dds, blind = TRUE)

# Plot PCA
plotPCA(rld, intgroup = "sampletype")

# Extract the rlog matrix from the object
rld_mat <- assay(rld)

# Compute pairwise correlation values
rld_cor <- cor(rld_mat)

# Plot heatmap
pheatmap(rld_cor)

7.0.3 3. Run DESeq2:

# **Optional step** - Re-create DESeq2 dataset if the design formula has changed after QC analysis in include other sources of variation

dds <- DESeqDataSetFromMatrix(countData = raw_counts, 
                              colData = metadata, design = ~ condition)

# Run DESeq2 differential expression analysis
dds <- DESeq(dds)

#  Output normalized counts to save as a file to access outside RStudio
normalized_counts <- counts(dds, normalized = TRUE)
    
write.table(normalized_counts, file = "data/normalized_counts.txt", 
            sep = "\t", quote = F, col.names = NA)

7.0.4 4. Check the fit of the dispersion estimates:

# Plot dispersion estimates
plotDispEsts(dds)

7.0.5 5. Create contrasts to perform Wald testing on the shrunken log2 foldchanges between specific conditions:

# Output results of Wald test for contrast
contrast <- c("condition", "level_to_compare", "base_level")

res <- results(dds, contrast = contrast)

coef = resultsNames(dds)

res_table <- lfcShrink(dds, coef = coef[2], res = res, 
                         type = "apeglm")

7.0.6 6. Output significant results:

### Set thresholds
padj.cutoff <- 0.05
lfc.cutoff <- 0.58 ## change in expression of 1.5

# Turn the results object into a data frame
res_df <- res %>%
  data.frame() %>%
  rownames_to_column(var = "gene") 

# Subset the significant results
sig_res <- dplyr::filter(res_df, padj < padj.cutoff &  
                  abs(log2FoldChange) > lfc.cutoff)

7.0.7 7. Visualize results: volcano plots, heatmaps, normalized counts plots of top genes, etc.

7.0.8 8. Make sure to output the versions of all tools used in the DE analysis:

sessionInfo()
## R version 4.2.3 (2023-03-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS 14.6.1
## 
## Matrix products: default
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggrepel_0.9.6               pheatmap_1.0.12             DESeq2_1.38.3              
##  [4] SummarizedExperiment_1.28.0 Biobase_2.58.0              MatrixGenerics_1.10.0      
##  [7] matrixStats_1.4.1           GenomicRanges_1.50.2        GenomeInfoDb_1.34.9        
## [10] IRanges_2.32.0              S4Vectors_0.36.2            BiocGenerics_0.44.0        
## [13] RColorBrewer_1.1-3          lubridate_1.9.3             forcats_1.0.0              
## [16] stringr_1.5.1               dplyr_1.1.4                 purrr_1.0.2                
## [19] readr_2.1.5                 tidyr_1.3.1                 tibble_3.2.1               
## [22] ggplot2_3.5.1               tidyverse_2.0.0             bookdown_0.41              
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-9           bit64_4.5.2            httr_1.4.7             numDeriv_2016.8-1.1   
##  [5] tools_4.2.3            bslib_0.8.0            utf8_1.2.4             R6_2.5.1              
##  [9] DBI_1.2.3              colorspace_2.1-1       apeglm_1.20.0          withr_3.0.2           
## [13] tidyselect_1.2.1       bit_4.5.0              compiler_4.2.3         cli_3.6.3             
## [17] DelayedArray_0.24.0    labeling_0.4.3         sass_0.4.9             scales_1.3.0          
## [21] mvtnorm_1.3-2          digest_0.6.37          rmarkdown_2.29         XVector_0.38.0        
## [25] pkgconfig_2.0.3        htmltools_0.5.8.1      bbmle_1.0.25.1         fastmap_1.2.0         
## [29] rlang_1.1.4            rstudioapi_0.17.1      RSQLite_2.3.7          jquerylib_0.1.4       
## [33] generics_0.1.3         farver_2.1.2           jsonlite_1.8.9         BiocParallel_1.32.6   
## [37] RCurl_1.98-1.16        magrittr_2.0.3         GenomeInfoDbData_1.2.9 Matrix_1.5-3          
## [41] Rcpp_1.0.13-1          munsell_0.5.1          fansi_1.0.6            lifecycle_1.0.4       
## [45] stringi_1.8.4          yaml_2.3.10            MASS_7.3-58.2          zlibbioc_1.44.0       
## [49] plyr_1.8.9             grid_4.2.3             blob_1.2.4             parallel_4.2.3        
## [53] bdsmatrix_1.3-7        crayon_1.5.3           lattice_0.22-6         Biostrings_2.66.0     
## [57] annotate_1.76.0        hms_1.1.3              KEGGREST_1.38.0        locfit_1.5-9.10       
## [61] knitr_1.49             pillar_1.9.0           geneplotter_1.76.0     codetools_0.2-20      
## [65] XML_3.99-0.17          glue_1.8.0             evaluate_1.0.1         vctrs_0.6.5           
## [69] png_0.1-8              tzdb_0.4.0             gtable_0.3.6           emdbook_1.3.13        
## [73] cachem_1.1.0           xfun_0.49              xtable_1.8-4           coda_0.19-4.1         
## [77] rsconnect_1.3.2        AnnotationDbi_1.60.2   memoise_2.0.1          timechange_0.3.0