"

Classification is a

**Inferring the
nature of evolutionary
relationship**

In a "*tree*," describe position of each '*twig
tip*' with respect to any / all others?

**Distance**: amount of **evolutionary
change** between twigs

Or: How **similar**
(close) are they?

**
phenetic**: distance measured **between ****tips
**

* * *"As the crow flies"* from
one twig to another

**patristic**:
distance measured **along connecting branches**

*"As the ant runs"* from one twig to another

**Relationship**: **pattern of ****connection**
between twigs

How **closely
related** are species?

**
cladistic relationship**: pattern of **branching
**back to **
Most
Recent Common Ancestor** (

How do twigs

**Phenetic **(** how similar**
are taxa?)

Criteria agree, **iff
**rates of evolution are constant

If evolutionary rates differ, closely *related *organisms
may appear *dissimilar*

*Ex*.: **Crocodiles**
more *closely **related* to **birds**, but more
*similar* (?) to **lizards**

**Crocodilia **resemble **Squamata **more than **Aves **(hence, Class **Reptilia**)

because
avian
ancestor(s) rapidly evolved specializations for flight

**Theoretical & technical breakthroughs **late
1960s ~ 1980s:

**Theory of
Phylogenetic Systematics **formalized

** Molecular data **(**allozymes
**& **DNA**) replace **morphology** as primary data
for phylogenetic inference

**Computational **power increases

**DNA sequencing** capacity increases

*****Patterns of evolutionary relationship**
to be understood from **molecular
data;
Patterns of organismal evolution **to be understood from

Simplest measures:

% sequence similarity

HOMEWORK: 5x5 Ape matrix

**
Patterns of similarity inferred from UPGMA cluster analysis**

[**U**nweighted
**P**air **G**roup **M**ethod, **A**rithmetic
averaging],

**Sequential Agglomerative Hierarchical Nesting** (**SAHN**) algorithm

**algorithm**: set of instructions for
repetitive task

In **(n) x (n)** matrix, join most * similar
*pair:

re-calculate

& so on, until last pair joined

diagram of phenetic

**UPGMA** method *assumes *rates of evolution *equal
*

so **branch tips " come
out even"** (contemporaneous)

DNA sequences evolve as

HOMEWORK: Practice problems
for UPGMA phenogram
calculations

**Alternative phenetic methods**

**Neighbor-Joining (NJ) analysis** does *not *assume
rate equality

large
rate
differences lead to **incorrect
trees**

NJ allows branch lengths
proportional to change: **tips
come
out uneven**

algorithm joins ** nodes**, rather
than

More realistic, computationally harder

**Differential weighting** of
nucleotide substitutions

accord greater '*significance*' to '*important*' changes

*Ex*.: **Kimura 2-parameter (K2P) model**
treats **Transitions **(**Ts**) & **Transversions **(**Tv**)**
**differently

**K ** **Transition
Bias **=** [Ts] / [Tv]**

**Twice** as many *kinds*
of Tv as Ts: expect** K = 0.5**

*But*: **Tv ***rare *for *close *comparisons,

more common for *distant *relationships

**K variable** according to evolutionary problem under
consideration:

**K > 10** for close comparisons, **K ~ 3** for moderate
comparison

**Tv-only** for distant
comparisons

Rely

** Choice of
preferred hypothesis made on Principle of Maximum
Parsimony**

**Parsimony**: **simpler
hypothesis** preferred

*Ex*.: If complex trait occurs in multiple species,

more *parsimonious *to hypothesize it evolved only *once*

**=>** Trait evolved
in single common ancestor

*Ex.*: **Evolution
of ice-breeding**** **in **Phocidae
** ("*True*" seals),

from **ecological **& **molecular parsimony **perspectives

**Evolutionary parsimony**:

Hypothesis that requires **fewer character changes** preferred

In **molecular systematics**,
count SNP
changes

**"Four-Taxon Problem**" & "**Three-Taxon
Statement**":

**Four taxa A,
B, C, **&** D **have **three hypotheses of
relationship**:

**A **most closely related to **B**, or **C**, or **D
**

"

Alternative hypotheses shown as

**Count ****changes
at informative SNP**s** **that
favor each hypothesis

Hypothesis with fewest changes
is **Maximum Parsimony**
explanation:

AKA '**Minimum
Length**' or 'Minimum
Spanning' solution

Cladistic analyses **weighted**:
objective criteria exist for DNA
data

Ex.: Count **Tv:Ts** as **3:1** => **Tv** are **3x**
as 'informative'

*or*, count **Tv** *only* (**Transversion
parsimony**) for "*deep*" analyses

*or*, count **1st & 2nd** position substitutions
>> **3rd ****: **replacement substitutions

**HOMEWORK: **What
* triplets *are
exceptions & why?

Because: Computational effort

hyper-exponential wrt

# networks mounts up:

for

# bifurcating rooted trees for t taxa = [(2t-3)!] / 2

ex.: if t = 10, # trees = 2,027,025

if t = 21, # trees = 3.198 x 10

Heuristic methods seek

for computationally difficult (impossible) problems

Parable of Near-Sighted Mountain Climber

Rooting a Tree:

**Evolutionary
trees **are **networks
**with **roots
**With

'

Thus

All equally parsimonious:

not all place

Some make

**Outgroup
rooting**

Include taxon *known
*to be less closely related

to any **ingroup **taxon than
they are to each other

Call this an **outgroup**

*
Ex*.: **Use feliform** as **
outgroup** to **caniform problem**

Note **cladistic tree **has same topology as NJ phenogram

Ex.
Wolffish (Anarhichas): **Johnstone
et al. (2007)**

Place root

HOMEWORK: Practice
four-taxon cladistic
problems

**Maximum Likelihood
analysis**

Different approach
to evolutionary trees based on **Bayes Theorem**

**Optimization
criterion**: How to choose '*correct *'
solution

** Phenetic **methods look for * shortest *tree

given

E.g., given estimates of all possible **SNP rates **among **A**,
**C**, **G**, & **T** (**n = 12**)

Calculate **probability
of simultaneous occurrence**

of *all *events
necessary to produce any particular tree

Any
* particular *tree is (extremely) unlikely,

*but **some
*tree is *least unlikely* ( = ** maximally likely**)

* Heuristic example*: Consider game of

Consider game of

**Statistical tests determine
confidence in branching order**

**Bootstrap Analysis**: a **re-sampling**
technique

statistical
tests
usually
involve **replication / repetition** of experiment:

this
is
(?) inconvenient with **DNA** data

Suppose *sample *data set of * n *bases
accurately

repeat phylogenetic analysis on each 'new' set:

among all of these sets,

how

"

cf. 1,140bp vs 11,582bp data sets

Download & install free **MEGA X** [**M****olecular ****E****volutionary
****G****enetic ****A****nalysis**]
software

**Lab Exercise**:
Are **Giant Pandas****
**(

15,582 bp

HOMEWORK**: ****Results** for
the Panda Problem

from **UPGMA,
Neighbor Joining, Maximum Parsimony, & Maximum Likelihood**
methods

**Evolutionary genetic
analysis** of **Newfoundland Caribou** (*Rangifer
tarandus terranovae*) (Wilkerson
*et al*. 2018)

Phylogenetic analysis of **codfish & relatives** (Gadidae)
(Coulson
*et al*. 2006)

A molecular understanding of the **evolutionary history of birds**
(Jarvis *et al*. 2014)

Applications to the **evolution of COVID-19
SARS** virus

Text
material © 2021 by Steven M. Carr