Chapter 18 Generate Custom ArchRGenome for Macaque Mmul10
18.1 Description
Generate a custom genome annotation for Macaque Mmul10 version. Based on the following data:
- Genome annotation:
- A BSgenome object which contains the sequence information for a genome
- Gene annotation:
- A TxDb object (transcript database) from Bioconductor which contains information for gene/transcript coordinates
- An OrgDb object (organism database) from Bioconductor which provides a unified framework to map between gene names and various gene identifiers
18.2 Create genome annotation
genomeAnnotation <-
createGenomeAnnotation(genome = "BSgenome.Mmulatta.UCSC.rheMac10")18.3 Create gene annotation
txdb <- makeTxDbFromEnsembl(organism = "Macaca mulatta")
seqlevels(txdb) <- paste0("chr", seqlevels(txdb))
seqlevels(txdb) <- paste0("chr", c(seq(1,20), "X", "Y"))
geneAnnotation <- createGeneAnnotation(TxDb = txdb,
OrgDb = org.Mmu.eg.db)18.4 Filter gene without symbol
loci <- grep("NA", geneAnnotation$genes$symbol)
gid <- geneAnnotation$genes$gene_id[-loci]
df <- select(txdb, keys = gid, columns="TXNAME", keytype="GENEID")
genes <- geneAnnotation$genes[-loci]
exons <- geneAnnotation$exons[-grep("NA", geneAnnotation$exons$symbol)]
tss <- geneAnnotation$TSS[which(geneAnnotation$TSS$tx_name %in% df$TXNAME)]
geneAnnotationSubset <- createGeneAnnotation(genes = genes,
exons = exons,
TSS = tss)18.5 Create ArchR genome object
save(genomeAnnotation, geneAnnotationSubset, file = "data/ArchR/Macaca_mulatta_genomeAnnotation_geneAnnotationSubset.RData")