Seminar: Quantifying the Stability of Phylogenetic Trees
Dr. Huaichun Wang
Quantifying the Stability of Phylogenetic Trees
Candidate for Faculty Position in Computer Science
Department of Computer Science
Thursday, October 30, 2014, 10:15 a.m., Room EN-2022
Contemporary phylogeneticists increasingly utilize large numbers of orthologous genes from many taxa to infer the evolutionary relationships among organisms. These phylogenomic approaches, based either on estimation from concatenated alignment of hundreds of genes as a single data set, or supertree methods applied to hundreds of individual gene trees, have the strength to drastically reduce stochastic errors associated with small datasets of single or a few genes in traditional phylogenetic studies and have led to substantial phylogenetic synthesis. But they also often result in incongruence or conflict among phylogenetic hypotheses. Therefore, assessing the robustness of an inferred phylogeny is an important element of phylogenetics. This is typically done with measures of stabilities at the internal and leaf nodes.
The bootstrap support for branches in maximum likelihood estimations or posterior probabilities in Bayesian inference measure the uncertainty about a branch due to the sampling of the sites from genes or sampling genes from genomes. However, these measures do not reveal how taxon sampling affects branch support. A branch in a phylogenetic tree and its associated internal node can be viewed as a split that separates the taxa into two nonempty subsets. We propose several split-based measures of stability determined from the bootstrap support of four-taxon statements corresponding to splits. These include BPtaxon (an average bootstrap proportion [BP] for all 4-taxon statements involving a taxon), BPsplit (average BP for the 4-taxon statements associated with a split) and two split-specific measures BPtaxon_split (average BP for all 4-taxon statements involving a taxon within a split) and RBIC-taxon-split (average BP for a split after removing a taxon).
We further propose a pruned-tree distance metric: the average distance between a maximum likelihood tree with a taxon pruned and the bootstrap trees with same taxon pruned. We applied our measures to empirical and simulated data and compared the results with several existing methods for quantifying phylogenetic stabilities. Although BPsplit is highly correlated with the conventional bootstrap support score, it is found to more accurately reflect the true branch support in the presence of influential or rogue taxon groups. Moreover, we show that split-specific measures are effective in determining which taxa or groups of taxa have low or high supports for an internal node, thus providing a valuable diagnostic tool to guide taxon sampling in experimental phylogenetic design.