Faith in parsimony

Fri Jul 22 10:02:50 CDT 2005

Thanks for this interesting quotation Robert! (one needs to ask: if there are 180 equally likely trees (64 steps), which number of trees exist with e.g. 65 steps - still "acceptable level of non-optimality" that can be given to Nature as a fair chance, IMHO).

Best!
Zdenek Skala

-----Original Message-----
From: Taxacom Discussion List [mailto:TAXACOM at LISTSERV.NHM.KU.EDU]On Behalf Of Robert Mesibov
Sent: Friday, July 22, 2005 9:06 AM
To: TAXACOM at LISTSERV.NHM.KU.EDU
Subject: Faith in parsimony

If you think that parsimony is a great idea in cladistic classification, but
a highly suspect idea when using cladistics to hypothesise phylogenies from
character data, you might like to read:

Larget, B., Kadane, J.B. & Simon, D.L. 2005. A Bayesian approach to the
estimation of ancestral genome arrangements. Molecular Phylogenetics and
Evolution 36(2): 214-223.

The authors are looking here not at the relationships between taxa, but at
the relationships between gene positions within genomes. Over time, the
"same" genes or blocks of genes can get shuffled around on a chromosome,
like beads on a string. The history to be reconstructed is the history of
the shuffling. Because shuffles are believed to happen much less often than
changes in sequences, shuffle trees promise to be very useful as hypotheses
for deep evolutionary history.

Larget et al. use a simple model for genome permutations, no molecular clock
and no polytomies. They assume a range of prior possibilities, run Markov
chain Monte Carlo simulations and estimate posterior probabilities for trees
of various lengths. For me their most interesting result comes from a data
set tracking the positions of 105 markers on the chloroplast genome of 13
genera of Camapanulaceae. Using parsimony methods, other workers had
reported trees with 67 steps (total gene inversions), a single tree of 65
steps, and a number with 64 steps. The posterior probabilities modeled by
Larget et al. favour 64 steps. Remarkably (to me, anyway), the Bayesian
analysis found 180 trees with this number of steps.

The authors conclude (I've deleted bits here):

"A most parsimonious reconstruction must always be a lower bound on the
actual number of genome rearrangement events. The best case for maximum
parsimony methods is in the case in which the most parsimonious
reconstruction is very likely to be correct... But if individual most
parsimonious reconstructions are very unlikely, there is a high degree of
uncertainty about which reconstruction is correct. In the human, fruit fly,
and sea urchin example, there is considerable uncertainty in the ancestral
arrangement. To report a single ancestral arrangement in this case is highly
misleading. The real difficulty is that maximum parsimony methods provide no
warning when the single reconstruction selected has low probability of being
correct...By contrast, Bayesian methods report a full posterior distribution
on the space of possible trees and arrangements. If one of those is very
likely (whether it is most parsimonious or not), that fact will be evident
from the distribution. If there are many, roughly equally likely trees or
ancestral arrangements, that also will be evident...The Bayesian analyses
have other virtues as well. Because the Markov chain Monte Carlo sampler
typically spends the bulk of its time on trees of high probability, it
coincidentally can find better maximum parsimony trees than found by other
computational approaches for some data sets. For example, in the
Campanulaceae data set, we found 180 different trees with 64 inversions. We
expect that other researchers interested in finding most parsimonious
reconstructions may find stochastic search based on MCMC to be more
efficient than current heuristic optimization methods, at least as part of
an initial search strategy to find a good starting point for a heuristic
search."

Being an ignoramus in computational taxonomy, I hadn't previously
appreciated that there were things you could do AFTER a parsimony-based
analysis to judge whether your pretty little tree had some chance
(literally) of being correct. I now understand that given the vast amount of
data in DNA sequences, and the increasing confidence people have in how
those sequences vary (mechanism models), an approach like the Bayesian one
is necessary in molecular taxonomy to assess, statistically, the confidence
you can have in a tree.

My question for the list is: have there been attempts to model non-molecular
evolution in any group so that confidence in various parsimony-derived trees
could be estimated statistically?
---
Dr Robert Mesibov
Honorary Research Associate, Queen Victoria Museum and Art Gallery
and School of Zoology, University of Tasmania
Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
(03) 6437 1195

Tasmanian Multipedes
http://www.qvmag.tas.gov.au/zoology/multipedes/mulintro.html
Spatial data basics for Tasmania
http://www.geog.utas.edu.au/censis/locations/index.html
---