Chapter 3 Running RUV-III-NB on large single cell RNA-seq datasets
To demonstrate the efficiency of RUV-III-NB on large single cell RNA-seq datasets, here, we fit the RUV-III-NB model to 104,417 T-cells available as BacherTCellData object in the scRNAseq Bioconductor package. The running time, when we use only \(\approx\) 2 percent of the cells to estimate \(\alpha\) is just over 43 minutes on a standalone Linux PC utilising 4 cores(4Gb RAM per core).
library(DelayedArray)
require(scRNAseq)
require(ruvIIInb)
<- BacherTCellData()
sce # remove low abundant genes
<- subset(sce,rowMeans(assays(sce)$counts)>0.001)
sce # define M matrix
=matrix(0,ncol(sce),length(unique(sce$seurat_clusters)))
Mfor(i in 1:ncol(M))
$seurat_clusters==(i-1),i] <- 1
M[scemode(M) <- 'logical'
# load scHK genes
data(Hs.schk)
<- rownames(sce) %in% Hs.schk
ctl # run ruvIIInb, using 2% of cells to estimate alpha
system.time({ out <- fastruvIII.nb(Y=DelayedArray(assays(sce)$counts),M=M,ctl=ctl,k=3,pCells.touse=0.02,ncores=4)
})# user system elapsed
#4237.940 1180.920 2588.304