Chapter 7 Summary of DGE workflow

Homework: First save a copy of this file lessons/07-DGE_summarizing_workflow.Rmd in your student_notes folder as ’_07-DGE_summarizing_workflow.Rmd’. Then modify the file to analyze the MOV dataset, starting with Mov10_full_counts.txt and Mov10_full_meta.txt in your data folder. Compare the “MOV10_knockdown” to the “control”. Include a heatmap and a volcano plot**. Don’t forget to read the data in first. points = +6

## Setup
library(DESeq2)

We have detailed the various steps in a differential expression analysis workflow, providing theory with example code. To provide a more succinct reference for the code needed to run a DGE analysis, we have summarized the steps in an analysis below:

7.0.1 1. Import data into dds object (be careful of the names of the count file and metadata): points = +2

# Check that the row names of the metadata equal the column names
#  of the **raw counts** data
all(colnames(raw_counts) == rownames(metadata))

# Create DESeq2Dataset object
dds <- DESeqDataSetFromMatrix(countData = raw_counts, colData = metadata, design = ~ condition)

7.0.2 2. Exploratory data analysis (PCA & heirarchical clustering) - identifying outliers and sources of variation in the data:

# Transform counts for data visualization
rld <- rlog(dds, blind = TRUE)

# Plot PCA
plotPCA(rld, intgroup = "sampletype")

# Extract the rlog matrix from the object
rld_mat <- assay(rld)

# Compute pairwise correlation values
rld_cor <- cor(rld_mat)

# Plot heatmap
pheatmap(rld_cor)

7.0.3 3. Run DESeq2 and normalize counts. Write the normalized counts to a file using write.table: ponts = +2

# Run DESeq2 differential expression analysis
dds <- DESeq(dds)

#  Output normalized counts to save as a file to access outside RStudio
normalized_counts <- counts(dds, normalized = TRUE)
    
write.table(normalized_counts, file = "data/normalized_counts.txt", 
            sep = "\t", quote = F, col.names = NA)

7.0.4 4. Check the fit of the dispersion estimates:

# Plot dispersion estimates
plotDispEsts(dds)

7.0.5 5. Create contrasts to perform Wald testing on the shrunken log2 foldchanges between specific conditions: points = +2

# Output results of Wald test for contrast
contrast <- c("condition", "level_to_compare", "base_level")

res <- results(dds, contrast = contrast)

coef = resultsNames(dds)

res_table <- lfcShrink(dds, coef = coef[2], res = res, 
                       type = "apeglm")

7.0.6 6. Output significant results: pointts = +1

### Set thresholds
padj.cutoff <- 0.05
lfc.cutoff <- 0.58 ## change in expression of 1.5

# Turn the results object into a data frame
res_df <- res %>%
  data.frame() %>%
  rownames_to_column(var = "gene") 

# Subset the significant results
sig_res <- dplyr::filter(res_df, padj < padj.cutoff &  
                  abs(log2FoldChange) > lfc.cutoff)

7.0.7 7. Visualize results: volcano plots, heatmaps, normalized counts plots of top genes, etc. points = +3

7.0.8 8. Make sure to output the versions of all tools used in the DE analysis:

sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sequoia 15.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggrepel_0.9.6               pheatmap_1.0.13            
##  [3] DESeq2_1.48.2               SummarizedExperiment_1.38.1
##  [5] Biobase_2.68.0              MatrixGenerics_1.20.0      
##  [7] matrixStats_1.5.0           GenomicRanges_1.60.0       
##  [9] GenomeInfoDb_1.44.3         IRanges_2.42.0             
## [11] S4Vectors_0.46.0            BiocGenerics_0.54.1        
## [13] generics_0.1.4              RColorBrewer_1.1-3         
## [15] lubridate_1.9.4             forcats_1.0.1              
## [17] stringr_1.5.2               dplyr_1.1.4                
## [19] purrr_1.1.0                 readr_2.1.5                
## [21] tidyr_1.3.1                 tibble_3.3.0               
## [23] ggplot2_4.0.0               tidyverse_2.0.0            
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6            xfun_0.53               bslib_0.9.0            
##  [4] lattice_0.22-7          numDeriv_2016.8-1.1     tzdb_0.5.0             
##  [7] vctrs_0.6.5             tools_4.5.1             parallel_4.5.1         
## [10] pkgconfig_2.0.3         Matrix_1.7-3            S7_0.2.0               
## [13] lifecycle_1.0.4         GenomeInfoDbData_1.2.14 compiler_4.5.1         
## [16] farver_2.1.2            codetools_0.2-20        htmltools_0.5.8.1      
## [19] sass_0.4.10             yaml_2.3.10             pillar_1.11.1          
## [22] crayon_1.5.3            jquerylib_0.1.4         MASS_7.3-65            
## [25] BiocParallel_1.42.2     cachem_1.1.0            DelayedArray_0.34.1    
## [28] emdbook_1.3.14          abind_1.4-8             bdsmatrix_1.3-7        
## [31] locfit_1.5-9.12         tidyselect_1.2.1        digest_0.6.37          
## [34] mvtnorm_1.3-3           stringi_1.8.7           bookdown_0.45          
## [37] labeling_0.4.3          fastmap_1.2.0           grid_4.5.1             
## [40] cli_3.6.5               SparseArray_1.8.1       magrittr_2.0.4         
## [43] S4Arrays_1.8.1          withr_3.0.2             scales_1.4.0           
## [46] UCSC.utils_1.4.0        timechange_0.3.0        rmarkdown_2.30         
## [49] XVector_0.48.0          httr_1.4.7              hms_1.1.3              
## [52] coda_0.19-4.1           evaluate_1.0.5          knitr_1.50             
## [55] bbmle_1.0.25.1          rlang_1.1.6             Rcpp_1.1.0             
## [58] glue_1.8.0              apeglm_1.30.0           rstudioapi_0.17.1      
## [61] jsonlite_2.0.0          plyr_1.8.9              R6_2.6.1