1 Genes, Genomes and Genome Browsers
1.1 What is a gene?
A gene is a segment of DNA that directs the production of a functional product (Figure 1.1). A significant number of eukaroytic genes are protein coding. Each protein-coding gene is first used as a template to create an RNA transcript through a process called transcription. The RNA transcript is then processed into a messenger RNA (mRNA). The most notable change that occurs during RNA processing is that intron sequences are removed (dark gray segments in Figure 1.1). The mRNA is then exported from the nucleus and used as a template to make a protein in a process called translation. Proteins do most of the work necessary to maintain cell structure, function and behavior.
While a gene requires additional sequence to become transcribed at the right time and place, for simplicity, the size of a gene (number of base pairs) is typically defined by where transcription begins and ends. Thus, the sites of transcription initiation and transcription termination define both gene size and RNA transcript size (but not mRNA size which is typically shorter as most genes have introns).
1.2 What is a genome?
Genes are distributed linearly along chromosomes2 (Figure 1.2). One complete set of genes (and thus, one complete set of chromosomes) makes up what we call a genome (sometimes called the “haploid genome”)3. The haploid genome consists of 23 chromosomes in XX females and 24 in XY males (both males and females also contain mitochondrial DNA). Each chromosome varies in length. For example, chromosome one is the longest at 248,956,422 base pairs (bp) or 82.7 mm in length4 (Figure 1.3).
In diploid organisms, somatic cells5 harbor two complete sets of chromosomes. Thus, over two meters of DNA is crammed into each somatic nucleus! But the diameter of the average mammalian nucleus is only 6 microns or 0.006 of a millimeter. How does the diploid genome fit into this tiny nucleus? First, the diameter of a DNA double helix is tiny (2 angstroms6) but also, each chromosome is carefully packaged by proteins in a systematic and stereotypical manner (Figure 1.4).
1.2.1 Test Your Understanding
- Which happens first: RNA processing, transcription or translation?
- Where does RNA processing occur?
- Where does transcription occur?
- Where does translation occur?
- TRUE or FALSE. If I told you the size of a mature mRNA transcript you would know the size of the gene in the genome?
- Chromosome 1 is the longest chromosome at 82.7 mm. How long is it in inches (round to the first decimal place)?
- The longest chromosome in C. elegans is chromosome V. It is 20,924,180 bp. How long is it in mm (round to the nearest whole mm)? HINT: The instructions for HOW to calculate the length of a chromosome is buried in a footnote!
- In humans, there is just over two meters of DNA crammed into each somatic nucleus. What can you find around your house that is about two meters in length?
- Fill in the blank. The first level of chromosomal packing requires an octamer of histone proteins that assemble into a ball-like structure called a _______
- TRUE or FALSE. Each chromosome occupies its own 3-dimensional space within the nucleus. Chromosomes are NOT mixed together like a bowl of spaghetti.
1.3 What is a Genome Browser?
A Genome Browser is a database that harbors graphical representations of genomes and associated biological data and information. According to Wikipedia, “Genome browsers enable researchers to visualize and browse entire genomes with annotated data including gene prediction and structure, proteins, expression, regulation, variation, comparative analysis, etc. Annotated data is usually from multiple diverse sources. They differ from ordinary biological databases in that they display data in a graphical format, with genome coordinates on one axis with annotations or space-filling graphics to show analyses of the genes” There are numerous Genome Browsers available. We will be using the UCSC Genome Browser.
The UCSC Genome Browser harbors the sequence of numerous sequenced genomes called “reference genomes”7 (24 linear chromosomes and the mitochondrial genome). Our focus will be on the human genome. Initially, we will focus our Genome Browser window on a single human gene, BBS1. In this chapter, you will learn how the UCSC Genome Browser is organized, how to configure so-called “evidence tracks”^[each evidence track harbors specific biological data from a single source pertaining the sequence displayed in the browser window) and how to navigate through Genome Browser window (scrolling left and right, zooming in and out and jumping to a new location).
Many organisms are diploid (including humans) meaning they have two copies of each chromosome. Thus the phrase “haploid genome” is more precise, although the word “haploid” is often omitted and simply assumed↩︎
One way to calculate the length of a chromosome: Multiply the length of a chromosome in base pairs (bp) with 0.000000332, the length (in mm) of each bp.↩︎
Cells of a multicellular organism can be divided into two main types: germ cells and somatic cells. Germ cells are destined to become the reproductive cells like sperm and oocytes. Somatic cells are destined to become all the other cell types like skin, neurons and muscle. This distinction is made as somatic cells die with the death of the organism while germ cells have the potential to pass their DNA on to the next generation.↩︎
a unit of length equal to one hundred-millionth of a centimeter - Definition from Oxford Languages↩︎
A “reference genome” (also called a “reference assembly”) is a genome sequence created from thousands of sequence runs assembled in silico to represent the sequence of a genome of one idealized individual organism. Since it is assembled from sequence data obtained from a number of donors, reference genomes do not represent the sequence of any single individual or organism, but rather a mosaic of multiple donors↩︎