Invariant sites and reliability

Wed Jul 20 10:20:58 CDT 2005

Interesting business, that of likelihood (e.g. max likelihood and Bayesian)
and uninformative characters.

I've used MrBayes to analyze a contrived data set like this:

1 CCCCCCCCCCCCCCCCCCC etc.
2 CACCCCCCCCCCCCCCCCC
3 CACCCCCCCCCCCCCCCCC
4 CCCCCCCCCCCCCCCCCCC

(You can also use 4 sequences of totally random data, except for one site.)

If you set the model to recognize all parsimony uninformative sites as
invariable, then you get a low (33%) Bayesian posterior probability for the
tree ((2, 3)4, 1).

If you set the model to treat all uninformative sites as variable, you get
BPP of 100% for the same tree.

Doubtless (I think) this last situation is because getting two "A"s at one
site for taxon 2 and 3 is very improbable when all the other sites have no
mutations to "A"s at all.

This shows a very clear difference in what characters and character states
really mean between analysis in morphology and molecular studies, though
it's hard to put into words.

Also, choice of model to recognize invariant sites or not makes a big
difference, doesn't it? How might this affect measures of uncertainty?

______________________
Richard H. Zander
Bryology Group, Missouri Botanical Garden
PO Box 299, St. Louis, MO 63166-0299 USA
richard.zander at mobot.org <mailto:richard.zander at mobot.org>
Voice: 314-577-5180;  Fax: 314-577-9595
Websites
Bryophyte Volumes of Flora of North America:
http://www.mobot.org/plantscience/bfna/bfnamenu.htm
Res Botanica:
http://www.mobot.org/plantscience/resbot/index.htm
Shipping address for UPS, etc.:
Missouri Botanical Garden
4344 Shaw Blvd.
St. Louis, MO 63110 USA

-----Original Message-----
From: A.P. Jason de Koning [mailto:apjdk at ALBANY.EDU]
Sent: Tuesday, July 19, 2005 9:31 PM
To: TAXACOM at LISTSERV.NHM.KU.EDU
Subject: Re: [TAXACOM] Molecular taxonomy: on way out?

> Why bother expurging the data matrix from non-informative
> characters when the algorithm doesn't take them into account?
>

I agree that throwing out data here is unnecessary, and in general
can be biasing.

An unmentioned issue to consider with respect to probabilistic
analyses (of molecular or phenotypic characters): the inclusion of
characters with phylogenetically non-informative character state
distributions can be beneficial because they may contribute
information to the estimation of model parameters (such as relative
rates of character state transformation), which can have an
"indirect" effect on the selection of an optimal topology (though
perhaps rarely, and not strongly).  Why?  Because the effect on
relative rate estimates by 'non-informative' characters can alter the
relative probabilities of trees for other 'informative' characters!
For a very large dataset, inclusion of such characters could matter
for any given phylogeny problem...certainly it matters in molecular
evolutionary analyses where phylogeny is not the desired endpoint.

Cheers,
- Jason