corroboration
Stuart G. Poss
sgposs at SEAHORSE.IMS.USM.EDU
Wed Aug 26 16:52:40 CDT 1998
--------------28A7B45506A3D20C59465B8F
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Dr. Mike Sharkey wrote:
> > I seem to recall that compatibility analysis *can* be used to evaulate
> >probabilities; i.e., the probability that these characters support the
> same tree.
> >
> >Meacham wrote several papers on this in the 80's, but I don't have the
> references
> >at hand.
>
> Compatibility attempts to measure the probability that a character is
> random, viz. given the distribution of a character, with respect to other
> characters, what is the probability that it is composed of random elements
> (each taxon with a particular character state obtained it convergently for
> example)?
> A low probability of being random infers a high probability of being
> informative...according to proponents. Of course, others suggest that a low
> probability of being random suggests that the probability of the character
> being random is low but nothing more.
>
This characterization of Meacham's probability model is not
factually correct. The Meacham probability model measures
the probability that two or more characters will be compatible
by chance. This probability is the sum of two independent
probabilities, corresponding to the two types of compatibility,
namely if the characters were compabilble due to a disjoint mapping or
if they were compatible due to a nested mapping. The model of
randomness proposed is that all characters that are nesting equivalent
are equally likely. Cladistic characters that have the same number
of elements (taxa) in their corresponding states are nesting equivalent.
In the example given by Meacham of a study set S, consisting of ten taxa,
two undirected binary cladistic characters F1 (whose taxa are partitioned
into two states (7|3)) and F2 (whose taxa are partitioned into two
states (8|2)) can be compatible in two ways represented by the ordered
pairs (0,0) and (0,1). The probability of nested compatibility Palpha
= P[(0,1)] = (3 chose 2)/(10 chose 2) = 1/15 and the probability of
disjoint compabibility Pbeta = P[(0,0] = (7 choose 2)/ (10 choose 2) = 7/15
or a joint [total] probability, Ptotal = 8/15. If these characters
were directed, the probability of compatibility would usually be lower
because their ordering could constrain the number of possible 2-tuple (
ordered pairs) representations of subsets that may be regarded as
equiprobable.
There exists a modified version of the compatibility analysis algorithm that
computes these probabilities.
Empirically, at least for the fish data sets that I have evaluated,
what this means is that large cliques of characters are unlikely to
arise by chance (ie you can't make randomly chosen characters agree
with each other [retain compatibility] easily and that some cliques
are less likely [often very much less] than others to be explainable
by chance distribution of taxa among states alone. Although one must
still assess whether such improbable cliques result from true phylogenetic
history or from potentially other non-random effects, such as those recently
discussed by Naylor in one of the more recent issues of Systematic
Biology, the method is useful in assessing the relative importance of
characters to particular phylogenetic hypotheses. It is also useful in
assessing whether a tree derived from a set of characters differs from a
result that could be obtained "at random", given a reasonable model of
randomness. The magnitude of the probabilities may not be as significant to
phylogenetic reconstruction as the finding that characters may differ widely
in what we can expect them to tell us about other characters.
Regrettably, I too do not find my copy of Meacham's paper (only the preprint)
and can not provide the exact reference at this writing, although I believe it
was in an issue of Mathematical Biosciences.
_____________________________________________________________________
Stuart G. Poss E-mail: sgposs at whale.st.usm.edu
Senior Research Scientist & Curator Tel: (228)872-4238
Gulf Coast Research Laboratory FAX: (228)872-4204
P.O. Box 7000
Ocean Springs, MS 39566-7000
--------------28A7B45506A3D20C59465B8F
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit
<HTML>
Dr. Mike Sharkey wrote:
<BLOCKQUOTE TYPE=CITE>> I seem to recall that compatibility analysis *can*
be used to evaulate
<BR>>probabilities; i.e., the probability that these characters support
the
<BR>same tree.
<BR>>
<BR>>Meacham wrote several papers on this in the 80's, but I don't have
the
<BR>references
<BR>>at hand.
<P> Compatibility attempts to
measure the probability that a character is
<BR>random, viz. given the distribution of a character, with respect to
other
<BR>characters, what is the probability that it is composed of random elements
<BR>(each taxon with a particular character state obtained it convergently
for
<BR>example)?
<BR> A low probability of being
random infers a high probability of being
<BR>informative...according to proponents. Of course, others suggest that
a low
<BR>probability of being random suggests that the probability of the character
<BR>being random is low but nothing more.
<BR> </BLOCKQUOTE>
<PRE>This characterization of Meacham's probability model is not</PRE>
<PRE>factually correct. The Meacham probability model measures</PRE>
<PRE>the probability that two or more characters will be compatible</PRE>
<PRE>by chance. This probability is the sum of two independent</PRE>
<PRE>probabilities, corresponding to the two types of compatibility,</PRE>
<PRE>namely if the characters were compabilble due to a disjoint mapping or</PRE>
<PRE>if they were compatible due to a nested mapping. The model of</PRE>
<PRE>randomness proposed is that all characters that are nesting equivalent</PRE>
<PRE>are equally likely. Cladistic characters that have the same number</PRE>
<PRE>of elements (taxa) in their corresponding states are nesting equivalent.</PRE>
<PRE></PRE>
<PRE>In the example given by Meacham of a study set S, consisting of ten taxa,</PRE>
<PRE>two undirected binary cladistic characters F1 (whose taxa are partitioned</PRE>
<PRE>into two states (7|3)) and F2 (whose taxa are partitioned into two</PRE>
<PRE>states (8|2)) can be compatible in two ways represented by the ordered</PRE>
<PRE>pairs (0,0) and (0,1). The probability of nested compatibility Palpha</PRE>
<PRE>= P[(0,1)] = (3 chose 2)/(10 chose 2) = 1/15 and the probability of</PRE>
<PRE>disjoint compabibility Pbeta = P[(0,0] = (7 choose 2)/ (10 choose 2) = 7/15</PRE>
<PRE>or a joint [total] probability, Ptotal = 8/15. If these characters</PRE>
<PRE>were directed, the probability of compatibility would usually be lower</PRE>
<PRE>because their ordering could constrain the number of possible 2-tuple (</PRE>
<PRE>ordered pairs) representations of subsets that may be regarded as</PRE>
<PRE>equiprobable.</PRE>
<PRE></PRE>
<PRE>There exists a modified version of the compatibility analysis algorithm that</PRE>
<PRE>computes these probabilities.</PRE>
<PRE></PRE>
<PRE>Empirically, at least for the fish data sets that I have evaluated,</PRE>
<PRE>what this means is that large cliques of characters are unlikely to</PRE>
<PRE>arise by chance (ie you can't make randomly chosen characters agree</PRE>
<PRE>with each other [retain compatibility] easily and that some cliques</PRE>
<PRE>are less likely [often very much less] than others to be explainable</PRE>
<PRE>by chance distribution of taxa among states alone. Although one must</PRE>
<PRE>still assess whether such improbable cliques result from true phylogenetic</PRE>
<PRE>history or from potentially other non-random effects, such as those recently</PRE>
<PRE>discussed by Naylor in one of the more recent issues of Systematic</PRE>
<PRE>Biology, the method is useful in assessing the relative importance of</PRE>
<PRE>characters to particular phylogenetic hypotheses. It is also useful in</PRE>
<PRE>assessing whether a tree derived from a set of characters differs from a</PRE>
<PRE>result that could be obtained "at random", given a reasonable model of</PRE>
<PRE>randomness. The magnitude of the probabilities may not be as significant to</PRE>
<PRE>phylogenetic reconstruction as the finding that characters may differ widely</PRE>
<PRE>in what we can expect them to tell us about other characters.</PRE>
<PRE>
Regrettably, I too do not find my copy of Meacham's paper (only the preprint)</PRE>
<PRE>and can not provide the exact reference at this writing, although I believe it</PRE>
<PRE>was in an issue of Mathematical Biosciences.</PRE>
<PRE></PRE>
<PRE>_____________________________________________________________________
Stuart G. Poss E-mail: sgposs at whale.st.usm.edu
Senior Research Scientist & Curator Tel: (228)872-4238
Gulf Coast Research Laboratory FAX: (228)872-4204
P.O. Box 7000
Ocean Springs, MS 39566-7000</PRE>
</HTML>
--------------28A7B45506A3D20C59465B8F--
More information about the Taxacom
mailing list