Chapter 9 Generate Custom ArchRGenome for Macaque Mmul10

9.1 Description

Generate a custom genome annotation for Macaque Mmul10 version. Based on the following data:

  • Genome annotation:
    • A BSgenome object which contains the sequence information for a genome
  • Gene annotation:
    • A TxDb object (transcript database) from Bioconductor which contains information for gene/transcript coordinates
    • An OrgDb object (organism database) from Bioconductor which provides a unified framework to map between gene names and various gene identifiers

9.2 Create genome annotation

genomeAnnotation <-
  createGenomeAnnotation(genome = "BSgenome.Mmulatta.UCSC.rheMac10")
## Getting genome..
## Getting chromSizes..
## Getting blacklist..
## Blacklist not downloaded! Continuing without, be careful for downstream biases..

9.3 Create gene annotation

txdb <- makeTxDbFromEnsembl(organism = "Macaca mulatta")
## Fetch transcripts and genes from Ensembl ... OK
## Fetch exons and CDS from Ensembl ... OK
## Fetch chromosome names and lengths from Ensembl ...OK
## Gather the metadata ... OK
## Make the TxDb object ... OK
seqlevels(txdb) <- paste0("chr", seqlevels(txdb))
seqlevels(txdb) <- paste0("chr", c(seq(1,20), "X", "Y"))

geneAnnotation <- createGeneAnnotation(TxDb = txdb,
                                       OrgDb = org.Mmu.eg.db)
## Getting Genes..
## Determined Annotation Style = ENSEMBL
## Getting Exons..
## Getting TSS..

9.4 Filter gene without symbol

loci <- grep("NA", geneAnnotation$genes$symbol)
gid <- geneAnnotation$genes$gene_id[-loci]
df <- select(txdb, keys = gid, columns="TXNAME", keytype="GENEID")
## 'select()' returned 1:many mapping between keys and columns
genes <- geneAnnotation$genes[-loci]
exons <- geneAnnotation$exons[-grep("NA", geneAnnotation$exons$symbol)]
tss <- geneAnnotation$TSS[which(geneAnnotation$TSS$tx_name %in% df$TXNAME)]

geneAnnotationSubset <- createGeneAnnotation(genes = genes, 
                                             exons = exons, 
                                             TSS = tss)

9.5 Create ArchR genome object

save(genomeAnnotation, geneAnnotationSubset, file = "data/ArchR/Macaca_mulatta_genomeAnnotation_geneAnnotationSubset.RData")