normal parsimony
Jan Bosselaers
dochterland at VILLAGE.UUNET.BE
Sat Jul 31 12:39:14 CDT 1999
Dear all,
Jan De Laet and Victor Albert asked me to forward the information below
to the list, introducing their new form of parsimony analysis,
information-normalised or normal parsimony analysis.
Regards,
Jan
Here it follows:
Dear Colleagues:
Please see below a condensed version of a document now available on the
web at
www.kuleuven.ac.be/phylogenetics. A new form of parsimony analysis is
described and justified. We have evaluated the method using a number of
data sets, and it appears to show great promise. Of particular interest
is the new method's enhanced ability to discover large groups of taxa.
All comments are welcome; we will be at the International Botanical
Congress next week.
Sincerely,
Victor Albert & Jan De Laet.
-------------------------------
THE FOLLOWING IS A CONDENSED VERSION OF A DOCUMENT THAT MAY BE
RETRIEVED AT WWW.KULEUVEN.AC.BE/PHYLOGENETICS
Normal Parsimony: an Information-Normalized Approach to Phylogeny
Reconstruction
Jan De Laet (1, 2) (jdelaet at nybg.org or jan.delaet at bio.kuleuven.ac.be)
Victor A. Albert (1) (valbert at nybg.org)
(1) The New York Botanical Garden
200th Street & Southern Boulevard
Bronx, New York 10458-5126
United States
(2) Laboratorium voor Systematiek
Instituut voor Plantkunde en Microbiologie
Katholieke Universiteit Leuven
Kard. Mercierlaan 92
B-3001 Heverlee
Belgium
Suggested Citation:
De Laet, J. and V. A. Albert. 1999. Normal parsimony: an
information-normalized approach to phylogeny reconstruction. Draft
version 30 July 1999. http://www.kuleuven.ac.be/phylogenetics
Summary
A new approach to character weighting in phylogeny reconstruction,
normal parsimony, differs from standard parsimony by including prior
weights that reflect the inherent information content of each character.
These weights are derived from two matrix-constant and tree-independent
parameters, the maximum (g) and minimum (m) change a given character may
have on any tree. The difference between g and m measures those units of
character information (similarity) that can be attributed either to
synapomorphy (incurring no extra steps) or homoplasy (forcing extra
steps). In this way, a character with high
(g-m) has a greater potential to be either corroborated or contradicted
by the data set as a whole. Standard approaches to parsimony analysis
assign equal
weights to characters with different values of (g-m), thereby ignoring
inherent differences in underlying information content. Use of (g-m)
values as character weights corrects this distortion.
An Example: Resolution of Large Groups at the Base of the Eudicots
The general lack of strong support for large groups (the "spines" of
trees) is a well-known phenomenon in molecular systematics. Sometimes
this lack of support for tree spines has been ascribed to rapid
radiations in the distant past that have left us with little evidence
for branching. The 500-sequence (499-species) rbcL data matrix of Chase
et al. (1993) was used to compare large-group jackknife support (Farris
et al. 1996) between the standard and normal parsimony
approaches. As an illustration, the eudicot angiosperms form a
monophyletic group using both methods. However, within eudicots many
more well-supported groups are found using normal parsimony, and those
groups that overlap with the standard parsimony result are generally
trengthened. Unlike the standard approach, normal parsimony supports the
Papaverales/Ranunculales as the sister group of all remaining eudicots.
Of the latter, Proteaceae/Platanaceae/Nelombonaceae are distinguished as
the basal most clade, followed by a trichotomy composed of
Sabiaceae/Buxaceae, Trochodendrales, and Gunneraceae plus the rest. This
increased structure is consistent with the classification proposed by
the APG (1998), which was largely based on well-supported groups found
by simultaneous analysis of three genes (rbcL, atpB, and 18S rDNA)
rather than one.
The enhanced discovery of large groups by normal parsimony is a natural
consequence of weighting characters by their (g-m) values. Consider two
characters that each start out as a sampling of four taxa. In the binary
case, this can be represented as AATT for each, with (g-m)= 1. Suppose
that for one character, taxon sampling increases to 100, but only
character state T is added in the process. Hence, the character is now
represented by AA(98*T). However,
(g-m) remains one. Now suppose for the second character that as taxon
sampling increases, character states A and T are added in equal
proportions. Such a character can be represented by (50*A)(50*T), where
(g-m) = 49. In terms of information content, none was added to the first
character because there is no improvement over the four-taxon case in
our ability to determine whether two taxa with state A group together
(incurring one step) or not (incurring maximum
steps). There are still only two classes (g = 2) of possible trees. In
contrast, the second character above permits g = 50 different classes of
trees. Here, trees that are supported by fewer steps are those that
group all or most taxa that have state A. Therefore, minimizing (g-m)s,
i.e., normal parsimony, will often allow characters with high (g-m) to
support larger groups than is possible
with standard parsimony, which holds (g-m) constant across all
characters. Of course, normal parsimony will also identify
well-supported, smaller groups, as shown by increased resolution at all
levels in the eudicot case (not shown).
Acknowledgments
This research was supported by the Lewis B. and Dorothy Cullman
Foundation. JDL is a postdoctoral fellow of the F.W.O., the Fund for
Scientific Research - Flanders (Belgium). Comments from James S. Farris
are greatly appreciated; however, all opinions expressed are entirely
our own.
--------------------------------------------------------------
Jan Bosselaers
"Dochterland", R. novarumlaan 2
B-2340 Beerse, Belgium tel 32-14-615896
home: dochterland at village.uunet.be fax 32-14-610306
work: jbossela at janbe.jnj.com
web: http://gallery.uunet.be/Dochterland/
"The nice thing about plants is that they demonstrate that you
can be highly successful without a brain." Dan Janzen
More information about the Taxacom
mailing list