Chapter 7 Summary of DGE workflow
Homework: modify this file to analyze the MOV dataset, starting with Mov10_full_counts.txt in your data folder. Compare the “MOV10_knockdown” to the “control”. Include a heatmap and a volcano plot points = +10
We have detailed the various steps in a differential expression analysis workflow, providing theory with example code. To provide a more succinct reference for the code needed to run a DGE analysis, we have summarized the steps in an analysis below:
7.0.2 2. Exploratory data analysis (PCA & heirarchical clustering) - identifying outliers and sources of variation in the data:
7.0.3 3. Run DESeq2:
# **Optional step** - Re-create DESeq2 dataset if the design formula has changed after QC analysis in include other sources of variation
dds <- DESeqDataSetFromMatrix(countData = raw_counts,
colData = metadata, design = ~ condition)
# Run DESeq2 differential expression analysis
dds <- DESeq(dds)
# Output normalized counts to save as a file to access outside RStudio
normalized_counts <- counts(dds, normalized = TRUE)
write.table(normalized_counts, file = "data/normalized_counts.txt",
sep = "\t", quote = F, col.names = NA)
7.0.5 5. Create contrasts to perform Wald testing on the shrunken log2 foldchanges between specific conditions:
7.0.6 6. Output significant results:
### Set thresholds
padj.cutoff <- 0.05
lfc.cutoff <- 0.58 ## change in expression of 1.5
# Turn the results object into a data frame
res_df <- res %>%
data.frame() %>%
rownames_to_column(var = "gene")
# Subset the significant results
sig_res <- dplyr::filter(res_df, padj < padj.cutoff &
abs(log2FoldChange) > lfc.cutoff)
7.0.8 8. Make sure to output the versions of all tools used in the DE analysis:
## R version 4.2.3 (2023-03-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS 14.6.1
##
## Matrix products: default
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggrepel_0.9.6 pheatmap_1.0.12 DESeq2_1.38.3
## [4] SummarizedExperiment_1.28.0 Biobase_2.58.0 MatrixGenerics_1.10.0
## [7] matrixStats_1.4.1 GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
## [10] IRanges_2.32.0 S4Vectors_0.36.2 BiocGenerics_0.44.0
## [13] RColorBrewer_1.1-3 lubridate_1.9.3 forcats_1.0.0
## [16] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2
## [19] readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
## [22] ggplot2_3.5.1 tidyverse_2.0.0 bookdown_0.41
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-9 bit64_4.5.2 httr_1.4.7 numDeriv_2016.8-1.1
## [5] tools_4.2.3 bslib_0.8.0 utf8_1.2.4 R6_2.5.1
## [9] DBI_1.2.3 colorspace_2.1-1 apeglm_1.20.0 withr_3.0.2
## [13] tidyselect_1.2.1 bit_4.5.0 compiler_4.2.3 cli_3.6.3
## [17] DelayedArray_0.24.0 labeling_0.4.3 sass_0.4.9 scales_1.3.0
## [21] mvtnorm_1.3-2 digest_0.6.37 rmarkdown_2.29 XVector_0.38.0
## [25] pkgconfig_2.0.3 htmltools_0.5.8.1 bbmle_1.0.25.1 fastmap_1.2.0
## [29] rlang_1.1.4 rstudioapi_0.17.1 RSQLite_2.3.7 jquerylib_0.1.4
## [33] generics_0.1.3 farver_2.1.2 jsonlite_1.8.9 BiocParallel_1.32.6
## [37] RCurl_1.98-1.16 magrittr_2.0.3 GenomeInfoDbData_1.2.9 Matrix_1.5-3
## [41] Rcpp_1.0.13-1 munsell_0.5.1 fansi_1.0.6 lifecycle_1.0.4
## [45] stringi_1.8.4 yaml_2.3.10 MASS_7.3-58.2 zlibbioc_1.44.0
## [49] plyr_1.8.9 grid_4.2.3 blob_1.2.4 parallel_4.2.3
## [53] bdsmatrix_1.3-7 crayon_1.5.3 lattice_0.22-6 Biostrings_2.66.0
## [57] annotate_1.76.0 hms_1.1.3 KEGGREST_1.38.0 locfit_1.5-9.10
## [61] knitr_1.49 pillar_1.9.0 geneplotter_1.76.0 codetools_0.2-20
## [65] XML_3.99-0.17 glue_1.8.0 evaluate_1.0.1 vctrs_0.6.5
## [69] png_0.1-8 tzdb_0.4.0 gtable_0.3.6 emdbook_1.3.13
## [73] cachem_1.1.0 xfun_0.49 xtable_1.8-4 coda_0.19-4.1
## [77] rsconnect_1.3.2 AnnotationDbi_1.60.2 memoise_2.0.1 timechange_0.3.0