Chapter 7 Summary of DGE workflow
Homework: First save a copy of this file lessons/07-DGE_summarizing_workflow.Rmd
in your student_notes folder as ’Mov10_full_counts.txt
and Mov10_full_meta.txt
in your data folder. Compare the “MOV10_knockdown” to the “control”. Include a heatmap and a volcano plot**. Don’t forget to read the data in first. points = +6
We have detailed the various steps in a differential expression analysis workflow, providing theory with example code. To provide a more succinct reference for the code needed to run a DGE analysis, we have summarized the steps in an analysis below:
7.0.1 1. Import data into dds object (be careful of the names of the count file and metadata): points = +2
7.0.2 2. Exploratory data analysis (PCA & heirarchical clustering) - identifying outliers and sources of variation in the data:
7.0.3 3. Run DESeq2 and normalize counts. Write the normalized counts to a file using write.table: ponts = +2
7.0.5 5. Create contrasts to perform Wald testing on the shrunken log2 foldchanges between specific conditions: points = +2
7.0.6 6. Output significant results: pointts = +1
### Set thresholds
padj.cutoff <- 0.05
lfc.cutoff <- 0.58 ## change in expression of 1.5
# Turn the results object into a data frame
res_df <- res %>%
data.frame() %>%
rownames_to_column(var = "gene")
# Subset the significant results
sig_res <- dplyr::filter(res_df, padj < padj.cutoff &
abs(log2FoldChange) > lfc.cutoff)
7.0.7 7. Visualize results: volcano plots, heatmaps, normalized counts plots of top genes, etc. points = +3
7.0.8 8. Make sure to output the versions of all tools used in the DE analysis:
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sequoia 15.5
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
##
## time zone: America/Los_Angeles
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggrepel_0.9.6 pheatmap_1.0.13
## [3] DESeq2_1.48.2 SummarizedExperiment_1.38.1
## [5] Biobase_2.68.0 MatrixGenerics_1.20.0
## [7] matrixStats_1.5.0 GenomicRanges_1.60.0
## [9] GenomeInfoDb_1.44.3 IRanges_2.42.0
## [11] S4Vectors_0.46.0 BiocGenerics_0.54.1
## [13] generics_0.1.4 RColorBrewer_1.1-3
## [15] lubridate_1.9.4 forcats_1.0.1
## [17] stringr_1.5.2 dplyr_1.1.4
## [19] purrr_1.1.0 readr_2.1.5
## [21] tidyr_1.3.1 tibble_3.3.0
## [23] ggplot2_4.0.0 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 xfun_0.53 bslib_0.9.0
## [4] lattice_0.22-7 numDeriv_2016.8-1.1 tzdb_0.5.0
## [7] vctrs_0.6.5 tools_4.5.1 parallel_4.5.1
## [10] pkgconfig_2.0.3 Matrix_1.7-3 S7_0.2.0
## [13] lifecycle_1.0.4 GenomeInfoDbData_1.2.14 compiler_4.5.1
## [16] farver_2.1.2 codetools_0.2-20 htmltools_0.5.8.1
## [19] sass_0.4.10 yaml_2.3.10 pillar_1.11.1
## [22] crayon_1.5.3 jquerylib_0.1.4 MASS_7.3-65
## [25] BiocParallel_1.42.2 cachem_1.1.0 DelayedArray_0.34.1
## [28] emdbook_1.3.14 abind_1.4-8 bdsmatrix_1.3-7
## [31] locfit_1.5-9.12 tidyselect_1.2.1 digest_0.6.37
## [34] mvtnorm_1.3-3 stringi_1.8.7 bookdown_0.45
## [37] labeling_0.4.3 fastmap_1.2.0 grid_4.5.1
## [40] cli_3.6.5 SparseArray_1.8.1 magrittr_2.0.4
## [43] S4Arrays_1.8.1 withr_3.0.2 scales_1.4.0
## [46] UCSC.utils_1.4.0 timechange_0.3.0 rmarkdown_2.30
## [49] XVector_0.48.0 httr_1.4.7 hms_1.1.3
## [52] coda_0.19-4.1 evaluate_1.0.5 knitr_1.50
## [55] bbmle_1.0.25.1 rlang_1.1.6 Rcpp_1.1.0
## [58] glue_1.8.0 apeglm_1.30.0 rstudioapi_0.17.1
## [61] jsonlite_2.0.0 plyr_1.8.9 R6_2.6.1