Bootstrapping in PHYLIP vs PAUP

Wed Oct 6 14:37:45 CDT 1999

I'm forwarding this response for Joe Felsenstein, who does not
subscribe to TAXACOM

Kent

Kent E. Holsinger                Kent at Darwin.EEB.UConn.Edu
                                 http://darwin.eeb.uconn.edu
-- Department of Ecology & Evolutionary Biology
-- University of Connecticut, U-43
-- Storrs, CT   06269-3043

------- Start of forwarded message -------

Two readers of TAXACOM called my attention to Derek Sites's inquiries:

> A coauthor and I have been analysing a dataset, he using PHYLIP, I using
> PAUP 4.0b2.  Our bootstrap runs produce very different results:
>
> Only two branches over 50% ( 70-90%) are found with PHYLIP repeatedly,
> whereas PAUP finds only one branch over 50% (51%) and this branch is
> neither of the two found by PHYLIP.  These are parsimony searches
> (PHYLIP: DNA parsimony algorithm, version 3.572c).
>
> I was wondering if anyone knew any particulars about why PHYLIP and PAUP
> would produce such different bootstrap results?
>
I am a bit mystified.  The only reason I can think of would be some
ties between alternative resolutions of parts of the tree.  Each program
might resolve them in different ways, ones that are arbitrary and depend
only on the input order of species.  This could give rise to artificially
high bootstrap values.  Does the result change if one turns on the J (Jumble)
option in PHYLIP, jumbling just once?  That would abolish any such arbitrary
input-order-related support.

If in PAUP* you use the same rearrangement options PHYLIP does (nearest
neighbor rearrangements followed by final rounds of Subtree Pruning and
Regrafting after all species have been added, one should get similar
answers (though a lot more quickly with PAUP*).

> > secondarily I was curious if anyone knew about why PHYLIP generates a
> > bootstrap tree (using majority rule & strict consensus) with branches
> > supported less than 50%- PAUP collapses these and I would think that if
> > they are uncollapsed they must be randomly chosen from a shortest tree
> > topology ..

PHYLIP adds branches to the majority-rule consensus tree until it comes to
branches that have 50% or less support.  Then it continues, but adds only
those that are compatible with those already on the tree.  This may cause it
to make arbitrary  "decisions" among equally-well-supported alternatives.
If you prefer, just ignore the ones that have less than 50% support.  I
made it do this to squeeze a bit more information out of the result,
though it may to some extent be arbitrary information.

I am not sure why Sikes thinks that these must be on the shortest tree.
They need not be.

----
Joe Felsenstein         joe at genetics.washington.edu
Dept. of Genetics, Univ. of Washington, Box 357360, Seattle, WA 98195-7360 USA

------- End of forwarded message -------