Part 4 GO Term Analysis

I mentioned in the previous chapter of this document that I performed GO term analysis. I will try to give an overview of what GO term analysis is.

4.1 What is GO Term Analysis?

GO term analysis is a technique generally performed after differential gene expression analysis. In any differential gene expression analysis, there are usually hundreds, if not thousands of differentially expressed genes. While knowing what some of these “differentially expressed genes” do may be helpful to know what biological processes are affected (as a result of these differentially expressed genes between sample classes), running through each of these differentially genes would be a very time-consuming process!

Hence, enter GO term analysis: genes are grouped into pre-defined (usually by a third party) sets according to their functional terms - phrases that roughly describe what a gene does (e.g., ion transport, ribosomal protein, MORN motif, etc).

Note that I said that these sets are “pre-defined” - probably one of the biggest authorities behind these “functional terms” is the Gene Ontology. These “functional terms” compiled by the Gene Ontology are hence called GO terms.

4.2 Who are the GO?

According to the Gene Ontology’s webpage, they are:

“…the world’s largest source of information on the functions of genes. This knowledge is both human-readable and machine-readable, and is a foundation for computational analysis of large-scale molecular biology and genetics experiment in biomedical research.”

– The Gene Ontology themselves

But, according to Wikipedia, the Gene Ontology is a larger part of a classification effort and a major initiative in Bioinformatics to:

  1. Annotate genes and gene products and assimilate and disseminate annotation data.
  2. Maintain and develop its controlled vocabulary of gene and gene product attributes.
  3. Provide tools for easy access to all aspects of the Gene Ontology’s data.

4.3 Performing GO Term Analysis in R

One way to do this is via the gProfileR package: a third-party software endorsed by the Gene Ontology. However, as of the time of writing, this package is deprecated and uses out-of-date data.

Hence, one can instead use the gprofiler2 package to search for the functions of genes via the gconvert() function (do revisit the second chapter of this document if you need a refresher on how to use this function).