More on the 'cladistics' of sequences

Mon Jun 14 19:25:07 CDT 2004

A 09:58 14/06/2004 -0400, John Grehan wrotet :
>My reference to 'latter' was to the cladistic analysis - not to the
>algorithm, although even though the same data matric can be analysed by
>a cladistic or phenetic algorithm I do think it makes a difference
>whether the data are phenetic (combination of unrecognized plesiomorphic
>and apomorphic characters) or cladistic (restricted to apomorphic
>states). Apologies for the lack of clarifity here

Well, John, this statement means that, after all, tou still have understood 
nothing of what has being explained on this list about phenetics,cladistics 
and characters.

Your recent statelment that you could have misunderstood cladistic courses 
gave me some hope, but halas...

- characters are not phenetic or cladistic in themselves (clear enough ?)
- "cladistic characters" are not restricted to apomorphic states: the 
putatively informative """cladistic character""" has at least two states, 
putatively plesiomorphic and putatively apomorphic. And you cannot put 
close together (on the topology) the putatively apomorphic states without, 
by the way, puttting close together the plesiomorphic states: they are 
complementary as for their distribution on the unrooted topology. This is 
why you can mogically dissociate uin the computing the infenrence of the 
optimal topology (cladistically) from the rooting of this topology 
(cladistically too). There is cladistic optimization of homology, and 
cladistic rooting, and you can perform both separately in the order you 
choose. Seemed to me that you agreed....

>I do not see how a cladistic algorithm can sort out primitive and
>derived characters if they are not identified in the first place.

They are identified in the first place by the user poiting himself at the 
outgroup. The programs does this. This is certainly the last time I will 
repeat this point. Seems you simply don't care a bit of what is explained 
on this list in this respect.

>For
>example, a character

not a character, a character state

>  may be said to be shared between tax a and b
>because it is not in the outgroup,

to be a putative synapomorphy between a and b, not simply "shared",
and putative synapomorphy has two independent copmponents :
- putative homology (being close to one another on the unrooted topology)
- and putative polarity (plesio-apomorphic states in the right order along 
the branches of the rooted tree). By the way, you need the two states 
(plesio / apomorphic) in order to polarize, hence the """cladistic 
character""" cannot be restricted to one state. Unless you would root the 
apomorphy into nothing. You have to root apomorphy in plesiomorphy, not in 
vacuum.

>but if the feature actually happens
>to be represented in the outgroup
>  in some way, the algorithm cannot know
>this without being told

But you tell him: you tell the program what the outgroups are. The progam 
will not invent this. And the program will deal with possible ambiguity in 
the outgroups just like it deals with ambiguity inside the ingroup : the 
cladistic way (see below).

>and if it is not told then it cannot come to
>that conclusion.

But it is told, it is told that these and those groups are putative 
ourtgroups (multiple outgroups analysis) and the possible ambiguity on one 
character is resolved by other characters. This is the congruence 
criterion, implementing Hennig's auxiliary principle of preferring homology 
against homoplasy in phylogeny inference.

How do you cope with ambiguity it the ingroup? Congruence criterion in the 
standard cladistic tradition?
Well, just do the same in the outgroup (multiple outgroups analysis, see 
PAUP manual and Farris 72).
All this is explained in any basic lecture in cladistics.

>John Grehan:
> > 1. The DNA sequence data only represents an overall similarity of DNA
> > sequences and is therefore not a necessary match for phylogeny;

The sequence data cannot "represent an overall similarity" : overall 
similarity is a property of the analysis, it is the criterion for phenetic 
grouping of taxa, and thus it qualifies the analysis, not the sequence data.
The aligned sequence data are just like your morphological data: a priori 
statements of putative homology. And the outgroup criterion is just like 
for your morphological data: a priori statements of putative polarity. 
Cladistic analysis optimizes both, not overall similarity between taxa. And 
the programs do perform cladistic analysis, which has nothing to do with 
overall similarity between taxa (phenetic analysis).

Once more, this statement of yours demonstrate that you don't get the point 
of phenetic versus cladistic analysis, and persist in qualifying the data 
themselves of being cladistic or phenetic. You have now shifted from single 
characters to sequences, which changes nothing to your misunderstanding, it 
reveals it instead.

Seems to me that all this has already been explained on this list. Ad 
nauseam and beyond.
Is it possible that the comprehension problem is rooted in the congruence 
criterion itself??? I'm now wondering...

I apologize for people familiar with the basics of cladistic and phenetic 
analysis (for jamming their mailbox with trivialities).

Pierre

Pierre Deleporte
CNRS UMR 6552 - Station Biologique de Paimpont
F-35380 Paimpont   FRANCE
Téléphone : 02 99 61 81 66
Télécopie : 02 99 61 81 88