Chapter 2 Workshop 1: Introduction

On this first day, we are going to get settled in and address the following objectives.

  • Set-up Slack (for discussion and support in self-study)
  • Access the padlet (for asking/answering questions during sessions)
  • Access Mentimeter (for interaction between instructor and students)
  • Introduction to the assessment task.

The intended learning outcome is that you should be able to:

  • Discuss what is bioinformatics and who does bioinformatics?
  • Understand how to translate DNA sequence into protein sequence.
  • Explain the concepts of reading frames and open reading frames (ORFs).
  • Be competent at using core databases and tools available via the NCBI web portal.
  • Be aware of the huge range of freely available bioinformatics databases and web servers.

I strongly recommend that you write (or type) the answers to each of the questions and tasks. During the workshops, you may be asked to share your answers, for example via Mentimeter.

2.0.1 What is bioinformatics?

In your own words, briefly explain or define what you understand by the word ‘bioinformatics.’ Feel free to do a Google search if you are not sure.

2.0.2 Who are the bioinformaticians?

Can you find an example of a bioinformatician? What contribution to science has she/he made?

2.0.3 The central dogma of molecular biology

Much of bioinformatics is concerned with information flow in the ‘central dogma’ of molecular biology. What do you understand by this term? Are there any biological processes that are exceptions to this dogma?

2.0.4 Translating a DNA sequence

Consider this short (double-stranded) DNA molecule. How many different ways can it be translated into protein (amino-acid sequence)?

5’ GGTGGCCGCACCACCGACCCGGTG 3’
3’ CCACCGGCGTGGTGGCTGGGCCAC 5’

That DNA sequence is actually a small fragment of this bacterial gene DNA sequence:

>MZ701793.1 Xanthomonas arboricola pv. pruni strain E10 RNA polymerase sigma factor (rpoD) gene, partial cds
TACGCCGAAGTCAATGACCACCTGCCCGACGACCTGGTCGACCCGGAGCAGATCGAAGACATCATCAGCA
TGATCAACGGCATGGGCATCGATGTCCATGAAGTTGCGCCCGATGCTGAAACCCTGTTGCTCAACGATGG
CAACACCGGCAACCGCGAGGTTGACGACACCGCAGCCGAAGAAGCTGCCGCCGCGCTGACCGCGCTCGAC
ACCGAAGGTGGCCGCACCACCGACCCGGTGCGCATGTACATGCGCGAAATGGGCACGGTCGAGCTGCTGA
CCCGCGAAGGCGAAATCGCCATCGCCAAGCGTATCGAAGAAGGCCTGAGCCAGGTCCAGGCAGCGCTGGG
TGTGTTCCCGCTGTCGACCGAAATGCTGCTGGCCGATTACGAAGCGCACAAGGAAGGCAAGAAGCGTCTG
GCCGAGATCGTGGTCGGCTTCAACGACCTGATCGAAGAAGCCGACGCCGCCGCTGCCGCGCTGGCCGCCG
CCGGCCCGGTCGCCGTCGACGAAGACGCGGTCGATGAAGACGACGACGAAGACGGCGATGACGACGCTGC
CGAGGAAGAGGCCGGCCCGACCGGTCCGGACCCGGTGGAAGTGGCCACGCGCATGGAGAACCTGGCCAAC
GAATACGCCAAGTTCAAGAAGATCTATGCCAAGAACGGCGCCGAGCACAAGCTGGTGGTCAAGGCGCGCG
AGGACATGGCCGCCATCTTCACCACGCTCAAGCTGCCGCTGCCGCTGACCGATGCGCTGGTCACCCAGCT
GCGTGGCGTGGTCAACGGCATCAAGGATCACGAGCGCAAGGTGCTGCACCTGGCCACCGCCGTGGCACGC
ATGCCGCGCAAGGATTTCATCCGCTCCTGG

What is the amino acid seuqence of the protein product encoded by this gene? You can use this online tool to translate DNA sequence into protein: https://web.expasy.org/translate/.

2.0.5 The NCBI web portal and Protein database

Here is an example of an entry in the NCBI’s Protein database: https://www.ncbi.nlm.nih.gov/protein/MBV7277430.1?report=fasta .

By following the links from the web page try to find as much information as posisble about this protein. For example:

  • Who generated the sequence?
  • How was the protein sequence determined? What method was used?
  • Why was this protein sequence generated? What was the motivation?
  • What is the structure of the protein?
  • What is the likely function of the protein? How do we know this?
  • Where is the protein found? That is, in which organism?
  • Are there any other very similar (very closely related) proteins?

2.0.6 Other databases and webservers

In this session, we have spent some time exploring the databases and tools offered by the NCBI web portal. One of the great strengths of this portal is that the various resources are well integrated and linked together. There are very many other useful bioinformatics tools and resources outside of the NCBI collection.
Another portal containing numerous integrated resources is he website of the European Bioinformatics Institute (EBI). You can think of this as the European equivalent of the American NCBI site. However, many other tools stand alone outside of these big, central institutions.

A good place to start browsing and learning about the range of available online bioinformatics tools is the the two special issues that are published each year in the journal Nucleic Acids Research. One of these issues is dedicated to databases, while the other is dedicated to webservers. You can read the editorial overviews for the 2021 editions here:

Can you find at least one example of a database or werbserver for each of these applications?

  • Identifying secondary metabolism gene clusters?
  • Designing genome-editing experiments?
  • Physico-chemical properties of proteins?
  • Single-cell analyses?
  • Something relevant to your MPhil/PhD project(s)?