Chapter 3 Programming

3.1 Learning about Sam/Bam

  • SAM: Sequence Alignment/Map format
  • Tab-delimited
  • Header lines (optional) start with @
  • Each alignment line has 11 mandatory fields

3.1.1 Terms

  • Template: DNA/RNA sequence part of which is sequenced on a sequencing machine or assembled from raw sequences
  • Segment: Contiguous sequence/subsequence
  • Read: Raw sequence that comes off a sequencing machine. May consist of multiple segments. Indexed by order
  • Linear Alignment: Part read to a single reference sequence wo/direction changes
  • Chimeric alignment: Alignment of a read that cannot be represented as a linear alignment. One linear alignment in a chimeric alignment is the “representative” alignment.
  • Read alignment: Alignment that is the complete representation of the alignment of the read
  • Multiple mapping: Multiple read alignments for the same read (one is considered primary)
  • 0-based coordinated system: Used by BAM, first base of a sequence is 0

3.1.2 Alignment section: Mandatory fields

Col Field Type Description
1 QNAME String Query template name
2 FLAG Int bitwise flag
3 RNAME String reference seq. name
4 POS Int 0-based leftmost mapping position
5 MAPQ Int Mapping quality
6 CIGAR String Cigar string
7 RNEXT string Reference name of mate/next read
8 PNEXT Int Position of the mate/next read
9 TLEN Int Observed template length
10 SEQ String Segment sequence
11 QUAL String ASCI of Phred score
knitr::include_graphics("bam.jpg.png")