Cluster Analysis: an example


    Given a matrix of pairwise distances among taxa, cluster analysis represents this information in a diagram called a phenogram that expresses the overall similarities among taxa. The Pair Group Method uses the following algorithm [a repetitive process for accomplishing a task]:

    (1) Identify the minimum distance in the matrix between any two taxa,
    (2) Combine these two taxa as a single pair,
    (3) Re-calculate the average distance between this pair and all other taxa to form a new matrix,
    (4) Return to Step 1: Identify the closest pair in the new matrix
    (5) Continue, until the last two clusters are joined.


Consider five taxa (
A, B, C, D, E) with the following distance matrix:

 


A B C D E
A 0 - - - -
B 20 0 - - -
C 60 50 0 - -
D 100 90 40 0 -
E 90 80 50 30 0

A & B are closest (20 units): join them into one cluster (AB) joining at 20, and recalculate the average distance from C, D, and E to (AB). [For example, the distance from C to (AB) = (60 + 50)/2 = 55, and the distance from D to (AB) = (100 + 90)/2 = 95]. This gives:
 


(AB) C D E
(AB) 0 - - -
C 55 0 - -
D 95 40 0 -
E 85 50 30 0

D & E are closest (30 units): join them into one cluster (DE) joining at 30, and recalculate the average distances between (AB), C, and (DE). [For example, the distance from (AB) to (DE) = (95 + 85)/2 = 90]. This gives:
 


(AB) C (DE)
(AB) 0 - -
C 55 0 -
(DE) 90 45 0

C & (DE) are closest (45 units): join them into one cluster (CDE) joining at 45, and recalculate the average distance between (CDE) and (AB). This gives:
 


(AB) (CDE)
(AB) 0 -
(CDE) 72.5 0

The two clusters join at 72.5. This completes the analysis. [Commentary on calculations]

These results may be presented as a phenogram with nodes at 20, 30, 45, and 72.5 units. The phenogram indicates that A & B are similar to each other, as are D & E, and that C is more similar to D & E :


Text material © 2010 by Steven M. Carr