Chapter 3 Thale Cress Gene Expression
In Computer Lab 8B we will analyse genomic data from the plant Arabidopsis thaliana (thale cress).
It is important to note here that our analysis procedure would (generally speaking) be the same if we were conducting a gene expression analysis on human data, or data for other plants or animals.
3.1 Thale Cress gene data
In Computer Lab 8B we will assess RNA-Seq
gene expression data for different time points over the germination and post-germination period of thale cress seeds.
This data, collected by Narsai et al. (2017), is publicly available on the NCBI Gene Expression Omnibus website12.
As you might expect, there will be a lot of gene expression changes as the seeds germinate.
Let’s take a look at the characteristics of this data set.
Note: We have conducted some initial data cleaning and preparation, to make this data more accessible.
## Chr X24hSL_1 X24hSL_2 X24hSL_3 X48hSL_1 X48hSL_2 X48hSL_3
## AT1G01010 1 282 136 315 646 622 610
## AT1G01020 1 1199 830 1341 768 769 888
## AT1G01030 1 264 79 267 266 218 333
## AT1G01040 1 1594 416 905 1640 1497 1893
## AT1G01050 1 4650 2976 4684 5350 5385 6000
## AT1G01060 1 8464 3007 8813 5066 5098 5923
Here, we note that:
- Each row refers to a different gene.
- The
Chr
column tells us which chromosome the gene is in. - The remaining columns contain the gene read counts for the different time points and replicates.
We have data for two time points, denoted X24hSL
and X48hSL
. These are, respectively, the time points for thale cress seeds 24 hours and 48 hours after exposure to sunlight, following a stratification process (whereby the seeds are encouraged to germinate).
For each time point, we have recordings for three replicates - hence the _1
’s, _2
’s and _3
’s appended to the X24hSL
and X48hSL
column names.