9  Enrichment analysis

Assess the functional significance of your omics data through integrated functional enrichment analysis using ReporterScore. Understand the biological implications of detected features in the context of known pathways and processes.

For more details, refer to ‘C. Peng, Q. Chen, S. Tan, X. Shen, C. Jiang, Generalized Reporter Score-based Enrichment Analysis for Omics Data. bioRxiv [Preprint] (2023). https://doi.org/10.1101/2023.10.13.562235’.

Table 9.1: Some commonly used enrichment analysis methods
Category Method Tools Notes
ORA Hypergeometric test / Fisher’s exact test DAVID (website), clusterProfiler (R package) The most common methods used in enrichment analysis. Selecting a list of genes is required.
FCS Gene set enrichment analysis (GSEA) GSEA (website) GSEA creatively uses gene ranking, rather than selecting a list of genes, to identify statistically significant and concordant differences across gene sets.
FCS Generalized reporter score-based analysis (GRSA/RSA) ReporterScore (R package developed in this study) Find significant metabolites (first report), pathways, and taxonomy based on the p-values for multi-omics data.
FCS Significance Analysis of Function and Expression (SAFE) safe (R package) SAFE assesses the significance of gene categories by calculating both local and global statistics from gene expression data.
FCS Gene Set Analysis (GSA) GSA (R Package) GSA was proposed as an improvement of GSEA, using the “maxmean” statistic instead of the weighted sign KS statistic.
FCS Pathway Analysis with Down-weighting of Overlapping Genes PADOG (R package) PADOGA assumes that genes associated with fewer pathways have more significant effects than genes associated with more pathways.
FCS Gene Set Variation Analysis (GSVA) GSVA (R package) A nonparametric, unsupervised method that transforms gene expression data into gene set scores for downstream differential pathway activity analysis.
PT Topology-based pathway enrichment analysis (TPEA) TPEA (R package) Integrate topological properties and global upstream/downstream positions of genes in pathways.

9.1 ReporterScore

Inspired by the classic RSA, we developed the improved Generalized Reporter Score-based Analysis (GRSA) method, implemented in the R package ReporterScore, along with comprehensive visualization methods and pathway databases. GRSA is a threshold-free method that works well with all types of biomedical features, such as genes, chemical compounds, and microbial species. GRSA works in the mixed (classic RSA) and directed modes (enhanced RSA). The directed mode uses signs of the reporter score to distinguish up-regulated or down-regulated pathways, which is more intuitive. Importantly, the GRSA supports multi-group and longitudinal experimental designs, because of the included multi-group-compatible statistical methods. The ReporterScore package also supports custom hierarchical and relational databases (e.g., table containing the correspondence between pathways and genes), providing extra flexibility for advanced users.

The workflow of GRSA in the ReporterScore package is shown in Figure Figure 9.1.

Figure 9.1: Generalized Reporter Score-based Enrichment Analysis for Omics Data
library(ReporterScore)
data("reporter_score_res")
plot_report(reporter_score_res, rs_threshold = 2.5)
Figure 9.2: GRSA result

For convenience, the ReporterScore package provides an interface to the above-mentioned enrichment methods:

gsa_res <- KO_gsa(reporter_score_res, p.adjust.method = "BH", perm = 100)
## perm= 10 / 100 
## perm= 20 / 100 
## perm= 30 / 100 
## perm= 40 / 100 
## perm= 50 / 100 
## perm= 60 / 100 
## perm= 70 / 100 
## perm= 80 / 100 
## perm= 90 / 100 
## perm= 100 / 100
plot_enrich_res(gsa_res)
Figure 9.3: GSA result
padog_res <- KO_padog(reporter_score_res, p.adjust.method = "none", perm = 100)
## Starting with 316 gene sets!
## Analyzing 193 gene sets with 3 or more genes!
## # of permutations used: 100 
## 10 / 100 
## 20 / 100 
## 30 / 100 
## 40 / 100 
## 50 / 100 
## 60 / 100 
## 70 / 100 
## 80 / 100 
## 90 / 100 
## 100 / 100
plot_enrich_res(padog_res)
Figure 9.4: PADOG result

9.2 KEGG

KEGG BRITE is a collection of hierarchical classification systems capturing functional hierarchies of various biological objects, especially those represented as KEGG objects.

We collected k00001 KEGG Orthology (KO) table so that you can summaries each levels abundance. Use load_KO_htable to get KO_htable and use update_KO_htable to update. Use up_level_KO can upgrade to specific level in one of “pathway”, “module”, “level1”, “level2”, “level3”, “module1”, “module2”, “module3”.

KO_htable <- load_KO_htable()
head(KO_htable, n = 20)
level1_name level2_name level3_id level3_name KO_id KO_name
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K00844 HK; hexokinase [EC:2.7.1.1]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K12407 GCK; glucokinase [EC:2.7.1.2]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K00845 glk; glucokinase [EC:2.7.1.2]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K25026 glk; glucokinase [EC:2.7.1.2]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K01810 GPI, pgi; glucose-6-phosphate isomerase [EC:5.3.1.9]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K06859 pgi1; glucose-6-phosphate isomerase, archaeal [EC:5.3.1.9]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K13810 tal-pgi; transaldolase / glucose-6-phosphate isomerase [EC:2.2.1.2 5.3.1.9]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K15916 pgi-pmi; glucose/mannose-6-phosphate isomerase [EC:5.3.1.9 5.3.1.8]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K24182 PFK9; 6-phosphofructokinase [EC:2.7.1.11]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K00850 pfkA, PFK; 6-phosphofructokinase 1 [EC:2.7.1.11]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K16370 pfkB; 6-phosphofructokinase 2 [EC:2.7.1.11]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K21071 pfk, pfp; ATP-dependent phosphofructokinase / diphosphate-dependent phosphofructokinase [EC:2.7.1.11 2.7.1.90]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K00918 pfkC; ADP-dependent phosphofructokinase/glucokinase [EC:2.7.1.146 2.7.1.147]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K00895 pfp, PFP; diphosphate-dependent phosphofructokinase [EC:2.7.1.90]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K03841 FBP, fbp; fructose-1,6-bisphosphatase I [EC:3.1.3.11]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K02446 glpX; fructose-1,6-bisphosphatase II [EC:3.1.3.11]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K11532 glpX-SEBP; fructose-1,6-bisphosphatase II / sedoheptulose-1,7-bisphosphatase [EC:3.1.3.11 3.1.3.37]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K01086 fbp-SEBP; fructose-1,6-bisphosphatase I / sedoheptulose-1,7-bisphosphatase [EC:3.1.3.11 3.1.3.37]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K04041 fbp3; fructose-1,6-bisphosphatase III [EC:3.1.3.11]
Metabolism Carbohydrate metabolism map00010 Glycolysis / Gluconeogenesis K01623 ALDO; fructose-bisphosphate aldolase, class I [EC:4.1.2.13]
plot_htable(type = "ko")
Figure 9.5: KEGG KO annotation categories

9.3 ARGs & VFs

For convenience, I also included the CARD database from https://card.mcmaster.ca/download/0/broadstreet-v3.2.8.tar.bz2.

CARDinfo <- load_CARDinfo()
my_sankey(count(CARDinfo$ARO_index, `AMR Gene Family`, `Resistance Mechanism`, `Drug Class`), topN = 5)
Figure 9.6: Sankey plot of ARG categories