Chapter 3 Running RUV-III-NB on large single cell RNA-seq datasets

To demonstrate the efficiency of RUV-III-NB on large single cell RNA-seq datasets, here, we fit the RUV-III-NB model to 104,417 T-cells available as BacherTCellData object in the scRNAseq Bioconductor package. The running time, when we use only \(\approx\) 2 percent of the cells to estimate \(\alpha\) is just over 43 minutes on a standalone Linux PC utilising 4 cores(4Gb RAM per core).

library(DelayedArray)
require(scRNAseq)
require(ruvIIInb)
sce <- BacherTCellData()
# remove low abundant genes
sce <- subset(sce,rowMeans(assays(sce)$counts)>0.001)
# define M matrix
M=matrix(0,ncol(sce),length(unique(sce$seurat_clusters)))
for(i in 1:ncol(M)) 
  M[sce$seurat_clusters==(i-1),i] <- 1
mode(M) <- 'logical'
# load scHK genes
data(Hs.schk)
ctl <- rownames(sce) %in% Hs.schk
# run ruvIIInb, using 2% of cells to estimate alpha
system.time({ out <- fastruvIII.nb(Y=DelayedArray(assays(sce)$counts),M=M,ctl=ctl,k=3,pCells.touse=0.02,ncores=4)
})
#    user   system  elapsed 
#4237.940 1180.920 2588.304