Chapter 2 The linux command line and HPC

2.1 Working on the command line

Bioinformatic analysis is conducted using a command-line interface (CLI). This is a text-based interface that allows users to interact directly with the computers operating system by executing commands. This is in contrast with the graphical user interface (GUI) that you might be familiar with, where instructions are sent to the computer via a point-aim-click mechanism. The most common shells are Bash and Zsh. These shells typically operate in Unix based operating systems. We will be using Ubuntu-based virtual machines in this course.

Shell scripting can be very useful in bioinformatics including

  • Reproducibility - Shell scripts can be saved and re-executed a later date. Commands executed in the shell are also saved and can be referred to at a later date.

  • Throughput - Many tasks in bioinformatics are repetitive. For example, if we were conducting a sequencing experiment on 100 samples and would like to trim adapters off reads we can use loops to perform this task on all sets of reads. This is much quicker than using a GUI to perform the same task.

  • Integration - Shell scripting allows you to integrate several programs in workflows. For example, might make a shell script that calls FastQC to inspect reads and then TrimGalore! to remove adapters. This can be done in a single script

  • Efficiency - Graphical user interfaces can be resource intensive. Using the shell frees resources that would usually be used for the GUI.

2.2 Accessing the BMS5021 command line interface

Bioinformatic analyses of large datasets requires a significant volume of compute resources. In this course, we will be accessing a high-performance compute cluster to perform our analyses. We have designed at interface to allow you to seamlessly connect to the high-performance compute cluster from your web browser. Follow the steps below to connect to the BMS5021 HPC command line interface:

  1. Navigate to https://bioinformatics-training.cloud.edu.au/login

  2. Select “Bioinformatics Training Platform” from the “Choose a service” drop down box.

  3. Click login. You will be presented with a login screen.

  4. Login with your monash username and password. Please ensure that your email address is all lower-case.

  5. On the side bar, click Terminal.

  6. Under the “Launch Terminal” heading, set the “service” to BMS5021 7GB (Note, sometimes we will use the “BMS5021 15GB”, however it is important that you default to the “7GB” job unless advised otherwise.

  7. Set the time to the appropriate length for the class. Click Launch.

  8. When you do this, a new terminal job should present below. If you are prompted to allow pop-ups please do so.

  9. Click the connect button and you will be presented a terminal interface.