Chapter 4 Quality control
4.1 FastQC
Quality control (QC) is fundamental and is involved in all RNA-Seq pre-processing. After sequencing the reads, you need to check the quality of the sequencing and FastQC does this very well. FastQC presents in a simple way some checks control on raw sequence data resulting from high-throughput sequencing. This tool shows the problems that exist in the data, problems that can result in misinterpretations of the biological result. Many works have used FastQC as it accepts SAM or BAM alignment files or raw files such as FASTQ. In addition, it provides the results in graphical form of the main sequencing metrics.
FastQC can be run with a graphical interface, but we will run it in the terminal. For didactic purposes, we will work with the dummy samples that are in 0-samples
:
ls ~/PreProcSEQ-main/0-samples
We are going to use FastQC to generate the quality reports for each FASTQ file. In this case, the samples are in 0-samples
and the FastQC result, we will save in 1-qualityControl_beforeTrimming/outputFastqc
:
fastqc ~/PreProcSEQ-main/0-samples/*.fastq -o ~/PreProcSEQ-main/1-qualityControl_beforeTrimming/outputFastqc
You can verify that output files were generated for each FASTQ file:
ls ~/PreProcSEQ-main/1-qualityControl_beforeTrimming/outputFastqc
The .zip
file contains the information and metrics used for QC. The .html
file covers QC graphically:
firefox ~/PreProcSEQ-main/1-qualityControl_beforeTrimming/outputFastqc/sample_01_R1_fastqc.html
4.2 MultiQC
MultiQC aggregates bioinformatics analysis results from many samples into a single report. This is what we are going to do with the data generated earlier with FastQC. Let’s set the files generated by FastQC as input. The MultiQC result will be stored in 1-qualityControl_beforeTrimming/outputMultiqc
:
multiqc ~/PreProcSEQ-main/1-qualityControl_beforeTrimming/outputFastqc/. -o ~/PreProcSEQ-main/1-qualityControl_beforeTrimming/outputMultiqc
The result of MultiQC is in 1-qualityControl_beforeTrimming/outputMultiqc
:
ls ~/PreProcSEQ-main/1-qualityControl_beforeTrimming/outputMultiqc
We can view the HTML file in a web browser:
firefox ~/PreProcSEQ-main/1-qualityControl_beforeTrimming/outputMultiqc/multiqc_report.html
Figure 1 shows the main graphic results of QC generated by FastQC with MultiQC. Sequencing was generally good (Figure 1.A), all read positions will have average phred quality scores above 30. The average quality of each read was also good (Figure 1.B), some reads had averages of phred scores that were in the yellow and red region, but most were in the green region. As for the content of unidentified bases (N), the content was low, but there were Ns in the readings (Figure 1.C). Finally, adapters were found in the analyzed dataset (Figure 1.D). However, the content was not enough to be in the yellow or red region.