bootstrapping

James Francis Lyons-Weiler weiler at ERS.UNR.EDU
Mon Dec 8 13:38:26 CST 1997


_______________________________________________________________________________

On Mon, 8 Dec 1997, sylvia hope wrote:

> JL-W:

As James Whitfield indicated, the bootstrap has indeed been
variously
> interpreted.  From my perspective, perhaps the most important assumption
> of the boostrap as uysually applied in phylogenetics is the assumption of
> external validity;, i.e., the variability found in the total of N
> pseudoreplicate resampling of the matrix is supposed to reflect the type
> of variability that would be found if N additional true samples were
> taken.  The boostrap cannot be used to test this assumption, obviously,
> and therefore the distinction between precision and accuracy is clear.
>
>           The problem in my mind is that people who use the boostrap
>           [sic] don't usually note in their presentation that it does
>           not indicate external validity.

        You are quite right.
>
>           A high bootstrap value for a particular clade can be due to
>           biased sampling of characters that are functionally related,
>           maybe by a focus on a limited anatomical complex, or a
>           single gene.

        It is also possible to get disturbingly high bv's by chance
alone.
>
>           Even if characters are many and varied in the matrix the
>           bootstrap can be misleading.  Again I belabor the // with
>           psych tests - usually they have a panorama of questions that
>           *seem* on the face of it to be relevant to some
>           characteristic they want to assess - and there may be high
>           reliability but low validity by some outside criterion (for
>           example, a personality test with pscyhiatric diagnosis or
>           school grades with IQ test).

        I think the analogy is on target.
>
>           This is the reason that I don't like the idea of combining
>           disparate data into a single set - if you keep different
>           kinds of data separate each data matrix provides an external
>           criterion of validity.

        At some level, every individual character can be thought of as
        a single data set.  Sampling theory is just now being seriously
        constructed (by myself and others) for a phylogenetic inference.
        The cladist's insistence on congruence as evidence is
        well-placed, whether it is congruence or within data sets.
        It is contentious about how best to determine (a) if congruence
        exists in a data set, and (b) how much is there.  I prefer to
        focus on explict measures of signal instead of tree-based
        interpretations of the data, whereas some others insist that
        pattern portrayed in trees is signal.  I suppose it would be
        bit like letting the subjects of a psych survery create their
        own questions to answer.

        James L-W.




More information about the Taxacom mailing list