Chapter 3 Programming
3.1 Learning about Sam/Bam
- SAM: Sequence Alignment/Map format
- Tab-delimited
- Header lines (optional) start with
@
- Each alignment line has 11 mandatory fields
3.1.1 Terms
- Template: DNA/RNA sequence part of which is sequenced on a sequencing machine or assembled from raw sequences
- Segment: Contiguous sequence/subsequence
- Read: Raw sequence that comes off a sequencing machine. May consist of multiple segments. Indexed by order
- Linear Alignment: Part read to a single reference sequence wo/direction changes
- Chimeric alignment: Alignment of a read that cannot be represented as a linear alignment. One linear alignment in a chimeric alignment is the “representative” alignment.
- Read alignment: Alignment that is the complete representation of the alignment of the read
- Multiple mapping: Multiple read alignments for the same read (one is considered primary)
- 0-based coordinated system: Used by BAM, first base of a sequence is 0
3.1.2 Alignment section: Mandatory fields
Col | Field | Type | Description |
---|---|---|---|
1 | QNAME | String | Query template name |
2 | FLAG | Int | bitwise flag |
3 | RNAME | String | reference seq. name |
4 | POS | Int | 0-based leftmost mapping position |
5 | MAPQ | Int | Mapping quality |
6 | CIGAR | String | Cigar string |
7 | RNEXT | string | Reference name of mate/next read |
8 | PNEXT | Int | Position of the mate/next read |
9 | TLEN | Int | Observed template length |
10 | SEQ | String | Segment sequence |
11 | QUAL | String | ASCI of Phred score |