5 Fragments of Transcripts

We explore how C4 plants build their mechanism to concentrate carbon dioxide in the leaves, and we do this by transcriptomic analysis. And I would like to discuss with you one last detail of how we perform this analysis.

We grind the leaves, extract the transcripts, read them (technically, sequence them) identify them, quantify how many of any different kind of transcript is contained in the leaf cells; then we have rough idea of which kind of molecular machinery that cell is producing, and therefore, what that cell is doing. This sounds already rather complicated, but it is not enough – before we read all the transcript, we fragment them, or, in other words, we tear them into pieces. Imagine that you enter in a library and you want to read all the books, but before you do it, you tear all the books into fragments, then you read every fragment and you try to reconstruct the meaning of each book by matching overlapping fragment with each other until you have meaningful paragraphs, chapters and books again. More work that seems at best illogical, if you can read a full transcript, why cut it in pieces before? The reason is again technical, with today’s technology it is more efficient to read through a short fragment of transcript rather than a long full one.

Reads of fragmeneted transcripts mapped and assembled on a reference. These mapping are visualized with the R package [Gviz](https://bioconductor.org/packages/release/bioc/html/Gviz.html), which stores also the data used for this example.

Figure 5.1: Reads of fragmeneted transcripts mapped and assembled on a reference. These mapping are visualized with the R package Gviz, which stores also the data used for this example.

5.1 Computers help

Of course we don’t read and reassemble fragmented transcripts by hand, computers do it for us. A typical transcriptomic experiments involves thousands of possible transcript types and millions of fragmented transcripts to be assembled, only a computer is able to deal with those numbers in reasonable time. Most of the computer software that assemble transcripts, identify them and quantify them is Open Source. This means that, as we do, you can download them, install them on your computer, inspect how they work and, if you have any good idea, modify them and improve them. For example, typical software for the assembling (mapping) of fragmented transcripts (reads) are: Hisat, Trinity and Salmon; A typical software for the statistical analysis is DeSeq. Also the transcriptomic data of countless experiments are available for everybody, for example at the ENA repostory.

I have been talking about transcripts as temporary copies of projects and books. They actually long molecules of RNA that can be represented as strings of code in 4 characters. Such as DNA and genes, they are a smart system that the cell uses to encode information – for those who can read it.