bootstrapping
James Francis Lyons-Weiler
weiler at ERS.UNR.EDU
Mon Dec 8 13:38:26 CST 1997
_______________________________________________________________________________
On Mon, 8 Dec 1997, sylvia hope wrote:
> JL-W:
As James Whitfield indicated, the bootstrap has indeed been
variously
> interpreted. From my perspective, perhaps the most important assumption
> of the boostrap as uysually applied in phylogenetics is the assumption of
> external validity;, i.e., the variability found in the total of N
> pseudoreplicate resampling of the matrix is supposed to reflect the type
> of variability that would be found if N additional true samples were
> taken. The boostrap cannot be used to test this assumption, obviously,
> and therefore the distinction between precision and accuracy is clear.
>
> The problem in my mind is that people who use the boostrap
> [sic] don't usually note in their presentation that it does
> not indicate external validity.
You are quite right.
>
> A high bootstrap value for a particular clade can be due to
> biased sampling of characters that are functionally related,
> maybe by a focus on a limited anatomical complex, or a
> single gene.
It is also possible to get disturbingly high bv's by chance
alone.
>
> Even if characters are many and varied in the matrix the
> bootstrap can be misleading. Again I belabor the // with
> psych tests - usually they have a panorama of questions that
> *seem* on the face of it to be relevant to some
> characteristic they want to assess - and there may be high
> reliability but low validity by some outside criterion (for
> example, a personality test with pscyhiatric diagnosis or
> school grades with IQ test).
I think the analogy is on target.
>
> This is the reason that I don't like the idea of combining
> disparate data into a single set - if you keep different
> kinds of data separate each data matrix provides an external
> criterion of validity.
At some level, every individual character can be thought of as
a single data set. Sampling theory is just now being seriously
constructed (by myself and others) for a phylogenetic inference.
The cladist's insistence on congruence as evidence is
well-placed, whether it is congruence or within data sets.
It is contentious about how best to determine (a) if congruence
exists in a data set, and (b) how much is there. I prefer to
focus on explict measures of signal instead of tree-based
interpretations of the data, whereas some others insist that
pattern portrayed in trees is signal. I suppose it would be
bit like letting the subjects of a psych survery create their
own questions to answer.
James L-W.
More information about the Taxacom
mailing list