corroboration

John Trueman trueman.bioinfo.rsbs at RSBS.ANU.EDU.AU
Tue Aug 25 12:16:28 CDT 1998


Dear Richard,
Gee, that's a good question.  I think agreement will be hard to find, and
this is because some take the view that corroboration has to do with an
increase in probability while others use corroboration in the sense Popper
used it.  Popper took great pains to explain that corroboration cannot be
equated with probability -- or in his words with a 'probability calculus'.
Ch X of 'The Logic of Scientific Discovery' begins with a discussion of
this contrast between those who would "describe theories as being neither
true nor false, but instead more or less probable" with those who seek
corroboration.  "In my view", says Popper, "the whole problem of the
probability of hypotheses is misconceived.  Instead of discussing the
'probability' of a hypothesis we should try to assess what tests, what
trials, it has withstood ... In brief, we should try to assess how far it
has been 'corroborated'."

In my (JT's) few forays into the philosophy (if you like to call it that)
of phylogenetic reconstruction I have always used 'corroboration' in
Popper's sense.  To my mind, we can never say that a tree hypothesis is
'true'.  We only can say:
1. that it is the best we could come up with using some specified data:

By 'best' I mean according to some tree-comparison criterion, eg, parsimony
or likelihood.  Both these are reasonable goals: other things equal, we
would prefer the most parsimonious tree, other things equal, we would
prefer the tree which is most likely.


2.  that *after* the hypothesis was constructed we subjected it to
such-&-such *critical* tests, and in so far as it did not fail those tests
it is corroborated.

Here, the key words are 'after' and 'critical'.

Re After:   A hypothesis cannot be corroborated using the very data from
which it was constructed.  In this I disagree with certain well-known
cladists who see 'corroboration' of a parsimonious tree merely in the fact
it is parsimonious. These cladists argue that "the most parsimonious tree
is the least falsified tree", meaning it is the tree least falsified by
homoplasy.  Out of the set of all possible trees it is the tree least in
need of protection from falsification by the addition of ad-hoc hypotheses.
>From this true statement they derive a false conclusion, "the most
parsimonious tree, being least falsified,is the best corroborated".  They
ignore that on a scale of corroboration-refutation every other tree stands
refuted by having had insufficient ad-hoc assumptions assigned to it but
the parsimonious tree has had assigned only just sufficient ad-hoc
assumptions to ensure its bare survival.  We might say figuratively it is
more corroborated than the other trees, but the absolute level of its
corroboration is precisely zero.

My view is this:  First, by our analysis of the data we create a hypothesis
that the taxa are related in some specified way.  From that point on this
tree  hypothesis becomes open to corroboration or refutation.  Let us now
genuinely seek to refute our hypothesis, and to the extent we have tried to
do this but have failed, the hypothesis has gained corroboration.

Re Critical:   The least critical test of a tree hypothesis which I can
imagine is to demonstrate that yes, this tree is the 'best' tree by our
prescribed tree selection criterion and using our original data.  At best
this would corroborate the hypothesis "that we indeed found the most
parsimonious (or most likely) tree".  More critical tests can be designed
by adding new data and showing that the estimate is not changed.  These
'more data' can be more characters, more taxa, or something in the nature
of a logical or probabilistic consequence of this tree but not of others.
For example, If the hypothesised tree would imply 'this species occurs in
Australia' and the rival tree(s) would imply 'this species does not occur
in Australia', and we look for the species in Australia and we find it, the
first tree is corroborated and the others are refuted by this observation
of the predicted consequence.  Of course, in practice all we can expect
most of the time are probabilistic statements: the probability of the
species occuring in Australia will differ depending whether tree 1 or tree
2 is true.  Our test will offer corroboration acording to the different
probabilities of observing the given consequence if one or the other tree
is true but will not offer corroboration in any absolute sense.

Other critical tests of a tree might involve perturbing the tree estimation
process.  I am more impressed with a tree that is robust to changes in the
way it was estimated than one that is not.  Do different yet not
unrealistic likelihood models lead to the same tree estimate?  Do different
yet not unrealistic assumptions of state-change costs, character types,
character weights, etc, in a parsimony framework give the same tree?  The
initial estimate gains corroboration if it is shown to be robust but loses
corroboration it if shown to be fragile.

Do changes to the taxon set change the tree?  Darwin's hypothesis tells us
the 'true' tree should be impervious to the addition or deletion of taxa, a
false tree may not.  We cannot always add new taxa to our analysis but we
can at least drop taxa sequentially using a taxon-jackknife technique.  If
the estimated tree is the historically correct tree all that should happen
is each branch gets pruned then put back.  We might summarise our results
into a jackknife consensus tree to show which parts of our tree survive
this attempt at corroboration.

Do changes to our character set change the tree?  We might try a character
jackknife.  We might try resampling from the available characters as in a
nonparametric bootstrap: which of the nodes are supported and which have no
support (which are corroborated and which are refuted) by this test?  If we
have an explicit model we may try a parametric bootstrap technique.

Instead, or as well, what if we were to deliberately destroy the 'real'
hierarchic signal by jumbling our data within each character as in a PTP or
TPTP test?  Would a tree this impressive as measured by our tree-selection
criterion (ie, this short, this likely) or with this specified node
(branch, clade) *still* have appeared?  If we can get a tree this short by
chance alone, or if our favored clade still would have appeared (with
probability above, say 1%)  after such a major manipulation of the data,
then we should be very wary.  Our original estimate, or something as good
as it and indistinguishable from it by our criterion, could have been
obtained without us using any historical hierarchic information whatsoever.
When this has happened our tree (or node) has failed our attempt at
corroboration and it stands refuted. If our tree survived the test it has
gained some (though possibly minor) corroboration to the extent we can now
say "there is something more to this tree than chance alone".

There are many ways of corroborating trees.  None of them make the tree any
more or any less probable.

Regards

John Trueman

(PS: Re your comment about Bremer support:  The difficulty with using raw
Bremer support as an index of corroboration is the same as using raw branch
length.  How do we know, for a given case, whether a Bremer support of
x-steps is impressive?  We must have some null model against which to
compare.)




More information about the Taxacom mailing list