corroboration

Stuart G. Poss sgposs at SEAHORSE.IMS.USM.EDU
Wed Aug 26 16:52:40 CDT 1998


--------------28A7B45506A3D20C59465B8F
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Dr. Mike Sharkey wrote:

> > I seem to recall that compatibility analysis *can* be used to evaulate
> >probabilities; i.e., the probability that these characters support the
> same tree.
> >
> >Meacham wrote several papers on this in the 80's, but I don't have the
> references
> >at hand.
>
>         Compatibility attempts to measure the probability that a character is
> random, viz. given the distribution of a character, with respect to other
> characters, what is the probability that it is composed of random elements
> (each taxon with a particular character state obtained it convergently for
> example)?
>         A low probability of being random infers a high probability of being
> informative...according to proponents. Of course, others suggest that a low
> probability of being random suggests that the probability of the character
> being random is low but nothing more.
>

This characterization of Meacham's probability model is not

factually correct.  The Meacham probability model measures

the probability that two or more characters will be compatible

by chance.  This probability is the sum of two independent

probabilities, corresponding to the two types of compatibility,

namely if the characters were compabilble due to a disjoint mapping or

if they were compatible due to a nested mapping.  The model of

randomness proposed is that all characters that are nesting equivalent

are equally likely.  Cladistic characters that have the same number

of elements (taxa) in their corresponding states are nesting equivalent.

In the example given by Meacham of a study set S, consisting of ten taxa,

two undirected binary cladistic characters F1 (whose taxa are partitioned

into two states (7|3)) and F2 (whose taxa are partitioned into two

states (8|2)) can be compatible in two ways represented by the ordered

pairs (0,0) and (0,1).  The probability of nested compatibility Palpha

= P[(0,1)] = (3 chose 2)/(10 chose 2) = 1/15 and the probability of

disjoint compabibility Pbeta = P[(0,0] = (7 choose 2)/ (10 choose 2) = 7/15

or a joint [total] probability, Ptotal = 8/15.  If these characters

were directed, the probability of compatibility would usually be lower

because their ordering could constrain the number of possible 2-tuple (

ordered pairs) representations of subsets that may be regarded as

equiprobable.

There exists a modified version of the compatibility analysis algorithm that

computes these probabilities.

Empirically, at least for the fish data sets that I have evaluated,

what this means is that large cliques of characters are unlikely to

arise by chance (ie you can't make randomly chosen characters agree

with each other [retain compatibility] easily and that some cliques

are less likely [often very much less] than others to be explainable

by chance distribution of taxa among states alone.  Although one must

still assess whether such improbable cliques result from true phylogenetic

history or from potentially other non-random effects, such as those recently

discussed by Naylor in one of the more recent issues of Systematic

Biology, the method is useful in assessing the relative importance of

characters to particular phylogenetic hypotheses.  It is also useful in

assessing whether a tree derived from a set of characters differs from a

result that could be obtained "at random", given a reasonable model of

randomness.  The magnitude of the probabilities may not be as significant to

phylogenetic reconstruction as the finding that characters may differ widely

in what we can expect them to tell us about other characters.

Regrettably, I too do not find my copy of Meacham's paper (only the preprint)

and can not provide the exact reference at this writing, although I believe it

was in an issue of Mathematical Biosciences.

_____________________________________________________________________
Stuart G. Poss                       E-mail: sgposs at whale.st.usm.edu
Senior Research Scientist & Curator  Tel: (228)872-4238
Gulf Coast Research Laboratory       FAX: (228)872-4204
P.O. Box 7000
Ocean Springs, MS  39566-7000



--------------28A7B45506A3D20C59465B8F
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<HTML>
Dr. Mike Sharkey wrote:
<BLOCKQUOTE TYPE=CITE>> I seem to recall that compatibility analysis *can*
be used to evaulate
<BR>>probabilities; i.e., the probability that these characters support
the
<BR>same tree.
<BR>>
<BR>>Meacham wrote several papers on this in the 80's, but I don't have
the
<BR>references
<BR>>at hand.

<P>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Compatibility attempts to
measure the probability that a character is
<BR>random, viz. given the distribution of a character, with respect to
other
<BR>characters, what is the probability that it is composed of random elements
<BR>(each taxon with a particular character state obtained it convergently
for
<BR>example)?
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; A low probability of being
random infers a high probability of being
<BR>informative...according to proponents. Of course, others suggest that
a low
<BR>probability of being random suggests that the probability of the character
<BR>being random is low but nothing more.
<BR>&nbsp;</BLOCKQUOTE>

<PRE>This characterization of Meacham's probability model is not</PRE>

<PRE>factually correct.&nbsp; The Meacham probability model measures</PRE>

<PRE>the probability that two or more characters will be compatible</PRE>

<PRE>by chance.&nbsp; This probability is the sum of two independent</PRE>

<PRE>probabilities, corresponding to the two types of compatibility,</PRE>

<PRE>namely if the characters were compabilble due to a disjoint mapping or</PRE>

<PRE>if they were compatible due to a nested mapping.&nbsp; The model of</PRE>

<PRE>randomness proposed is that all characters that are nesting equivalent</PRE>

<PRE>are equally likely.&nbsp; Cladistic characters that have the same number</PRE>

<PRE>of elements (taxa) in their corresponding states are nesting equivalent.</PRE>

<PRE></PRE>

<PRE>In the example given by Meacham of a study set S, consisting of ten taxa,</PRE>

<PRE>two undirected binary cladistic characters F1 (whose taxa are partitioned</PRE>

<PRE>into two states (7|3)) and F2 (whose taxa are partitioned into two</PRE>

<PRE>states (8|2)) can be compatible in two ways represented by the ordered</PRE>

<PRE>pairs (0,0) and (0,1).&nbsp; The probability of nested compatibility Palpha</PRE>

<PRE>= P[(0,1)] = (3 chose 2)/(10 chose 2) = 1/15 and the probability of</PRE>

<PRE>disjoint compabibility Pbeta = P[(0,0] = (7 choose 2)/ (10 choose 2) = 7/15</PRE>

<PRE>or a joint [total] probability, Ptotal = 8/15.&nbsp; If these characters</PRE>

<PRE>were directed, the probability of compatibility would usually be lower</PRE>

<PRE>because their ordering could constrain the number of possible 2-tuple (</PRE>

<PRE>ordered pairs) representations of subsets that may be regarded as</PRE>

<PRE>equiprobable.</PRE>

<PRE></PRE>

<PRE>There exists a modified version of the compatibility analysis algorithm that</PRE>

<PRE>computes these probabilities.</PRE>

<PRE></PRE>

<PRE>Empirically, at least for the fish data sets that I have evaluated,</PRE>

<PRE>what this means is that large cliques of characters are unlikely to</PRE>

<PRE>arise by chance (ie you can't make randomly chosen characters agree</PRE>

<PRE>with each other [retain compatibility] easily and that some cliques</PRE>

<PRE>are less likely [often very much less] than others to be explainable</PRE>

<PRE>by chance distribution of taxa among states alone.&nbsp; Although one must</PRE>

<PRE>still assess whether such improbable cliques result from true phylogenetic</PRE>

<PRE>history or from potentially other non-random effects, such as those recently</PRE>

<PRE>discussed by Naylor in one of the more recent issues of Systematic</PRE>

<PRE>Biology, the method is useful in assessing the relative importance of</PRE>

<PRE>characters to particular phylogenetic hypotheses.&nbsp; It is also useful in</PRE>

<PRE>assessing whether a tree derived from a set of characters differs from a</PRE>

<PRE>result that could be obtained "at random", given a reasonable model of</PRE>

<PRE>randomness.&nbsp; The magnitude of the probabilities may not be as significant to</PRE>

<PRE>phylogenetic reconstruction as the finding that characters may differ widely</PRE>

<PRE>in what we can expect them to tell us about other characters.</PRE>

<PRE>
Regrettably, I&nbsp;too do not find my copy of Meacham's paper (only the preprint)</PRE>

<PRE>and can not provide the exact reference at this writing, although I believe it</PRE>

<PRE>was in an issue of Mathematical Biosciences.</PRE>

<PRE></PRE>

<PRE>_____________________________________________________________________
Stuart G. Poss&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; E-mail: sgposs at whale.st.usm.edu
Senior Research Scientist &amp; Curator&nbsp; Tel: (228)872-4238
Gulf Coast Research Laboratory&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FAX: (228)872-4204
P.O. Box 7000
Ocean Springs, MS&nbsp; 39566-7000</PRE>
&nbsp;</HTML>

--------------28A7B45506A3D20C59465B8F--




More information about the Taxacom mailing list