Chapter 18 Generate Custom ArchRGenome for Macaque Mmul10

18.1 Description

Generate a custom genome annotation for Macaque Mmul10 version. Based on the following data:

  • Genome annotation:
    • A BSgenome object which contains the sequence information for a genome
  • Gene annotation:
    • A TxDb object (transcript database) from Bioconductor which contains information for gene/transcript coordinates
    • An OrgDb object (organism database) from Bioconductor which provides a unified framework to map between gene names and various gene identifiers

18.2 Create genome annotation

genomeAnnotation <-
  createGenomeAnnotation(genome = "BSgenome.Mmulatta.UCSC.rheMac10")

18.3 Create gene annotation

txdb <- makeTxDbFromEnsembl(organism = "Macaca mulatta")
seqlevels(txdb) <- paste0("chr", seqlevels(txdb))
seqlevels(txdb) <- paste0("chr", c(seq(1,20), "X", "Y"))

geneAnnotation <- createGeneAnnotation(TxDb = txdb,
                                       OrgDb = org.Mmu.eg.db)

18.4 Filter gene without symbol

loci <- grep("NA", geneAnnotation$genes$symbol)
gid <- geneAnnotation$genes$gene_id[-loci]
df <- select(txdb, keys = gid, columns="TXNAME", keytype="GENEID")

genes <- geneAnnotation$genes[-loci]
exons <- geneAnnotation$exons[-grep("NA", geneAnnotation$exons$symbol)]
tss <- geneAnnotation$TSS[which(geneAnnotation$TSS$tx_name %in% df$TXNAME)]

geneAnnotationSubset <- createGeneAnnotation(genes = genes, 
                                             exons = exons, 
                                             TSS = tss)

18.5 Create ArchR genome object

save(genomeAnnotation, geneAnnotationSubset, file = "data/ArchR/Macaca_mulatta_genomeAnnotation_geneAnnotationSubset.RData")