DNA aata matrix 10bp

Sample DNA data set

The top panel shows a typical set of 10 bp DNA sequences from five individuals in a population. DNA sequence variants occur at three positions, called Single Nucleotide Polymorphisms (SNPs) or Segregating Sites, at positions 2, 4, & 9, flagged here by dots (.) below the panel. The middle panel re-codes the DNA sequences in binary form, in each case taking the state in Sequence #1 as 0 and any SNP as 1. We will assume for the moment that all SNP changes are from 0 1, and we call any 0 the ancestral state and any 1 the derived state. The bottom panel extracts the binary codes for the three SNPs for ease of comparison. Note that there are three alleles (haplotypes) among the five individuals: 000 in #1, 011 in ##2 & 3, and 101 in ## 4 & 5. [Note that the three SNPs all involve transversions (alternative purine / pyrimidine bases): t/g, c/a, & a/t].

Similar results can be obtained simply by considering the DNA sequence data directly and counting "1" for each difference between any pair of sequences. Thus for Sequences ##1 & 2, the difference d = 1 + 1 = 2 for the c/a difference at position 4 and the a/t difference at position 9. This does not require re-coding of the data.


Figure &Text material © 2022 by Steven M. Carr