Section 3 Week 2: The Human Genome Project and beyond

This week we are looking at the Human Genome Project and some subsequent achievements that it enabled. You have watched, or will watch, the “Genome: unlocking the code” film. This excellent documentary covers the Human Genome Project itself and DNA sequencing by the Sanger method, the genetics of monogenic and of polygenic (complex) diseases and traits. It tells the stories of Hugh Rienhoff Jnr’s research on his daughter’s genetic condition, the deciphering of the genetic code, the discovery of the genetic basis for Duchenne muscular dystrophy, Huntington’s disease, and diabetes. It describes how, following the Human Genome Project, researchers went further by sequencing genomes or exomes of many people rather than just one sequence representing an “average” human and that opened the possibility of genome-wide association studies (GWAS). It also describes the ENCODE project, which examined not just the sequence of the genome, but the functions of each site in the genome. It explains that our genome contains relatively few protein-coding genes, which occupy only about 2 % of the genomic DNA, the other 98 % sometimes being described as ‘junk.’ We see how comparative genomics can reveal which parts of the ‘junk’ might actually be important.

3.1 Hugh Y. Rienhoff Jnr. and his daughter Beatrice

In the documentary, we hear the story of how Hugh Rienhoff successfully identified a genetic variant in his daughter Beatrice’s genome that explained some issues with her growth and development. Let’s try to dig a little deeper into the specifics. In an article published in a research journal Rienhoff (2016), Rienhoff states that Beatrice’s syndrome corresponds to OMIM 615582. What does this mean?

Note that this is part of the NCBI’s web portal, which we have used previously for searching databases of sequences and literature. This time we are going to use it to search the Online Mendelian Inheritance in Man (OMIM®).

You can read more about what is OMIM here: https://omim.org/help/faq#1_1. But for our purposes, we can simply think of it as a database of genetic variants (mutations) and traits (especially inherited diseases) that are caused by those variants.

So, let’s try to address the following questions:

  • On which band on which chromosome is the genetic variant associated with this medical condition?
  • What is the name of this condition?
  • Can you find five symptoms that are found in patients with this condition?
  • Which protein-coding gene is implicated in this condition?
  • Is there any published, peer-reviewed evidence for this gene’s involvement in the condition?
  • Can you find this gene in the Ensembl genome browser?
  • Can you find any single-nucleotide polymorphisms (SNPs) that are known to occur in this gene?
  • Can you find an example of a company that supplies a medical test suitable for diagnosing this condition? How does it work?
  • Is this condition monogenic or polygenic?

A good way to start your investigation would be to enter 615582 into the search box at https://www.ncbi.nlm.nih.gov/omim/.

3.2 The ENCODE controversy

In this week’s learning material, you encountered the Encyclopedia of DNA Elements, also known as the ENCODE project. Imagine you are set the following exam question: “What is the ENCODE project and why were its published findings considered by some to be controversial?” Write a few bullet points or draw a sketch/mindmap of how you might go about answering that question. A good place to start would be to read this: Graur et al. (2013).