Striking a balance, weighting and Cladistics

Mon Feb 26 16:45:07 CST 2001

-----Original Message-----
From: Richard Zander [mailto:rzander at SCIENCEBUFF.ORG]
>Even a bush is an exact solution that means nothing unless the lineages
>below it are very well supported.

If there are a gazillion alternative trees for a set of taxa, then a bush is
merely a graphical representation of the statement: "the evidence does not
support a preference for any of the gazillion trees over any other". You
can't get a less exact solution than that. The question of whether the
branches themselves are supported or not is simply outside of the scope of
the analysis, we assume we are working with valid species.

>With most cladograms, the null hypothesis of no phylogenetic
>loading cannot be rejected by the data presented (when looking at an
>entire tree of many taxa).

Once again, there is no null hyothesis per se in play in the first place.
This is not a statistical analysis. It is simply a procedure to detect the
pattern in the data. The "significance" of the pattern is another matter
entirely.

>But! What about support values? I believe that the decay index and
>bootstrap values of the second shortest tree (of many taxa) are usually
>nearly as impressive as those in the shortest tree.

I dont understand what question you think cladistics is trying to answer. My
view is that we are asking this: which one of the finite set of alternative
phylogenetic patterns for a given set of taxa is best supported by the
evidence of character homologies? We obviosuly can, and do, ask the
subsequent question: how strong is that preference?

>Statistics is the spine of science.
No. Statistics is the spine of a certain type of science, dealing with the
induction of general causal principles through the analysis of samples of
the effects of those principles (as in your chicken example). This is not
the only type of science that one can do. The specific character
transformations that are the currency of phylogenetic research are not the
varied effects of a single causal principle or factor. Character
transformations do not form a homogenous class of effects with a discernable
distribution and estimatable parameters, not even when we restrict our
universe of character tranformations to nucleotide substitutions. Character
transformations are inherintly independent events, with unique starting
points and ending points, subject to a unique set of influences and
pressures. They cannot be meaningfully analogized to coin flips. Even
nucleotide tranformations, which do have chemically identical starting and
ending points, are known to occur at different rates in different lineages
at different sites. There is simply no justification for burdening a
phylogenetic analysis with statistically derived process parameters, and for
constraining ones results to be dependent on such assumptions. (Unless of
course, your only goal is precisely that - to explore the consequences for
the phylogenetic pattern if your model and process parameters correspond to
how evolution actually occured).

>there is doubtless no sharp distinction between the likelihood of the
>shortest tree and that of the the secondmost short tree and the thirdmost
>and the fourthmost, etc. (unless we have very, very few taxa in the data
>set).

I have seen an analysis of a large data set in which there was a small
identifiable set of trees that were not significantly different from the max
tree, with the other many millions significantly different (i.e there was a
valid statistically significant result set generated). The consensus of the
result set was a bush however :)

> An exact solution is publishable through the magic of the
>philosophy of parsimony, even though there are doubtless . . . doubtless
>many almost as well supported alternative trees.

No magic needed. And I think you exaggerate the preponderance of well
supported alternatives, at least in some cases. But even if you were right,
so what? What is 854 + 239 equal to? 1093. 1092 and 1094 are almost correct,
does that diminish the validity of the correct answer? Is my analogy silly?
I think it fair to say that the parsimony algorithm simply tallies up the
evidence for various grouping hypotheses and reports the one(s) which have
the most support in the evidence. That is all that we ask of it, for that is
our simple question (see above): which set of congruent homologies has the
most support?
Perhaps you think that statistics is science and addition is not. But
statistics and addition are merely two logical tools that should be chosen
for use in relation to the needs of the question asked. Cladistics does not
seek to model evolution, and does not therefore make statistical estimates
of process parameters. It merely tallies up the results of the study of
organisms and the defintions of their characteristic similarites and
differences.
But once again I emphasize, that the "strength of the preference" is an
interesting question, and one which is addressed in standard practice. I
think you will find that the enthusiasm with which cladists advocate
nomencltural changes and encourage evolutionary studies based on their
results, is directly proportional to the "strength' of the preference for
the shortest tree.

>Since cladistic and maximum likelihood analyses are optimizations, of
>course they approximate general intuitive evaluations of phylogenetic
>relationships ...I'm sure that somewhere in published exact results there
>is greater precision and as such it is an advance in knowledge, but it is
>very hard to tell such an advance from nonsense.

Intuitve evaluations fail to demand explicit, testable, and discuss-able
character defintions. In addition, characters are weighted under a
subjective, personal, and hence untestable standard. I find it inconceivable
that one would argue that demanding explicit rigor in character definition,
and transparant methodologies for combining evidence would not lead to an
increase in the scientific value of the result.
Or that we should prefer less scientific classifications to more scientific
ones.
Unless of course, you were arguing that phylogenetics is focused on an
inherintly uninteresting or unanswerable question.
-tom