Striking a balance, weighting and Cladistics

Richard Zander rzander at SCIENCEBUFF.ORG
Sat Feb 24 11:28:07 CST 2001


Well, not really, Tom.

Again a preference for an exact solution gets you something for nothing.
Even a bush is an exact solution that means nothing unless the lineages
below it are very well supported.

Example: You suspect a coin is loaded. You toss it 100 times. It comes up 50
tails and 50 heads (a bush). Your best answer is that it is not loaded. A
different coin that you check for loading comes up 51 tails and 49 heads, so
your best answer is that it is loaded. Yet . . . the answers are
scientifically, if statistics means anything, equivalent. The null
hypothesis of not being loaded cannot be rejected, and you are left with
nothing. With most cladograms, the null hypothesis of no phylogenetic
loading cannot be rejected by the data presented (when looking at an entire
tree of many taxa).

But! What about support values? I believe that the decay index and bootstrap
values of the second shortest tree (of many taxa) are usually nearly as
impressive as those in the shortest tree. This is logical, right, assuming
both trees are resolved? Or maybe we are unsure because the software we use
doesn't give us those alternative support values easily, and we have to
think about them first.

On the other hand, maybe shorter trees are much less well supported . . .
does this mean they can be ignored? Statistics is the spine of science.
Consider this example: you have a chicken yard. There is a big chicken and
50 little chicks (each one dyed a different Easter egg color) in the yard.
You toss a kernel of corn into the yard and glance away, and fttt it was
eaten. Which bird ate the kernel? You toss more kernels randomly and find
from the data set you compile that the big chicken is 50 times more likely
to eat a kernel than any chick, and each chick is about as likely as any
other chick to eat a kernel.

Maximum likelihood analysis would say that the big chicken ate the original
kernel with a likelihood ratio of 50! (i.e., comparing likelihoods of the
hypothesis of maximum likelihood and the secondmost likely.) Wow! However .
. . all the birds contributed to the data set, and any bird that contributed
to the data set cannot be ignored, can it? Therefore the chance of the big
chicken eating the original kernel was 50%. (Maximum likelihood gives you
something for nothing if you trust in likelihood ratios and you have more
than two possible hypotheses.)

But! Note that no one chick (alternative hypothesis) had a likelihood
anywhere as high as that of the big chicken! What is the chance we can
eliminate the likelihoods of the chicks as irrelevant and just too small to
matter? We can't, because they all contributed to the data set, and only if
we can eliminate them from the data set can we eliminate their summed
probabilities (summing to 50%). And there is no empirically based theory
that will allow us to do so (or to eliminate long trees, since these also
must be considered as contributing to a cladistic data set since any one of
them could have been solely responsible for it).

But! What is the chance that a 50:1:1:1...(50 ones)% probability
distribution would happen by chance alone? Well, the distribution of
likelihoods is not a data set of observations (not a sample), and we can't
do chi-squared or other non-parametric analyses on these. This probability
distribution would be approximately the same every time you created a data
set with these birds.

The situation with cladograms is worse than this extreme example because
there is doubtless no sharp distinction between the likelihood of the
shortest tree and that of the the secondmost short tree and the thirdmost
and the fourthmost, etc. (unless we have very, very few taxa in the data
set).

Therefore we really can get something for nothing, but not only chickens
will squawk. An exact solution is publishable through the magic of the
philosophy of parsimony, even though there are doubtless . . . doubtless
many almost as well supported alternative trees. I limit this comment to
trees of many taxa. Four-taxon trees are a special case and non-parametric
tests of support are possible.

Of what value are exact solutions? I will leave this blank, but see my
Deconstructing Reconstruction for a possible answer:
http://www.buffalomuseumofscience.org/BOTANYDECON/moweb.htm

Since cladistic and maximum likelihood analyses are optimizations, of course
they approximate general intuitive evaluations of phylogenetic relationships
(e.g., "uncontested groups"). However, the special qualifications for
respect and attention of these methods of phylogenetic analysis is that they
are more exact than intuition. I submit that such greater precision is
larely artifacts of philosophy, rhetoric and statistical gobbledegook. I'm
sure that somewhere in published exact results there is greater precision
and as such it is an advance in knowledge, but it is very hard to tell such
an advance from nonsense.

>From the above discussion, you can estimate my opinion of efforts in
creating a Phylocode as a substitute or even as an alternative for the
flexible-though-imperfect standard codes we have now.

R.

----- Original Message -----
From: "Thomas DiBenedetto" <TDibenedetto at DCCMC.ORG>
To: "'Richard Zander'" <rzander>; <TAXACOM at USOBI.ORG>
Sent: Wednesday, February 21, 2001 9:40 AM
Subject: RE: Striking a balance, weighting and Cladistics


> Richard,
> When I use the terms "result" or "preference", I am simply referring
> to the output of the analysis, without any implication that the result is
> necessarily meaningful. I recognize that the terms can be qualified by
> saying (e.g.) "meaningless result" or "trivial result" or "weakly
supported
> preference". As I stated previously, standard cladistic practices include
> some metric of "strength of support" along with the "result" - the set of
> equally most-parsimonious trees. If nodes exhibit a high decay index, or
> high bootstrap or jackknife numbers, and the analysis has included all
> available evidence relevant to the taxa, then one can have confidence that
> the result represents our best reconstruction of the phylogeny, and we can
> proceed to use that result in further evolutionary studies, and to make
> changes in classification, if appropriate.
snip
> -tom



---------
From:
Richard H. Zander
Curator of Botany
Buffalo Museum of Science
1020 Humboldt Pkwy
Buffalo, NY 14211 USA
email: rzander at sciencebuff.org
voice: 716-896-5200 x 351




More information about the Taxacom mailing list