More on the 'cladistics' of sequences

Thu Jun 10 16:10:37 CDT 2004

A 21:51 09/06/2004 -1100,John Grehan wrote :
>I am not sure about the need for several outgroups

You need not several outgroups. It's simply better, because if all 
outgroups don't root in the same place into your ingroup, then this is an 
indication that you have made a mistake. Your analysis tells you that all 
your putative outgroups cannot be "out" altogether. See PAUP's manual, and 
also Farris 1972 as hinted off list by Jan de Laet.

>  if one chooses a sufficiently broad single outgroup.

By "single", you mean a monophyletic group, or a possibly paraphyletic 
arrangement of taxa?

>Thus for the orangutan-human synapomorphies the context I am looking at

A "context", or a "single group", and what do you mean by "group"?

>is ALL other primate species collectively and that is quite a lot of species.

Indeed, but likely not a monophyletic group, rather a paraphyletic 
arrangement, thus simply a series of "out" species or groups of species, 
hence you have not a single monophyletic outgroup but a lot of primate 
groups putatively outside your ingroup.

But I won't blame you to use as many taxa as possible, in and out. The 
bigger the better.
Note that you can do that with molecules: always use as much relevant 
evidence as available.

>  Most of the characters stand up pretty well in that regard,

"Stand up"???! Trying to figure out your method from your other posts I now 
presume that you mean "their state in the outgroups is uniform"?

>and even those of lesser distribution

Ha-ha !  Interesting... Hence, some of your characters have not a uniform 
character state in all putatively outgroup taxa? Hence your method would 
finally be the standard cladistic one as implemented in current programs? 
But in this case, why do you reject these programs by calling them 
"phenetic"?... Very, very puzzling indeed...

>  may be supportable (e.g. lack of ischial callosities which is unique to 
> orangutans and humans among Old World monkeys and the apes could be 
> reasonably treated as an apomorphy rather than as a plesiomorphy 
> inherited all the way from the split with New World monkeys which lack 
> the callosities).

You now are describing what all cladistic programs do!!! Astonishing... Do 
you really know what the programs do? I must assume that you simply don't 
know (or you inadvertently forgot). But you reject them?

>  This is just an observation, not necessarily a criticism of using 
> several outgroups.

Your "observation" consists in describing your method, and I must 
aknowledge that your method is the one implemented by the programs. I can't 
figure out the slightest reason why you reject these programs...

I can take it another way:
your "callosities" character is a "phenetic" character according to your 
highly personal use of the term "phenetic" (see your previous posts): it 
doesn't have a uniform character state in all outgroups. It's sometimes 
present, sometimes absent, how can you know the plesiomorphic state for 
sure? But you decide to use it anyway, and optimize the whole topology for 
all characters: certainly according to information gathered from other 
characters, you thus accept two changes for this particular character 
instead of the minimum possible of one change, according to the slightly 
less parsimonious scenario (and corresponding topologies): absent 
-->present --> absent.

The method you are implementing is exactly what the programs do. They do it 
for you. They root on outgroups, and prefer the optimal overall topology in 
case of ambiguity.
The fact that you are apparently "computing" everything in your brain 
instead of using these so convenient programs changes nothing to the logics.

And you persist in rejecting these programs...  Fascinating...

>Similarly, in examining the phylogenetic relationships of a single genus 
>of ghost moths comprising about 12 species I am using the entire family as 
>the outgroup.

Hence, once again, plenty of outgroups indeed (species, monophyletic groups 
of species, including their possible internal polymorphism for some 
characters I presume... don't tell me you overlook this possible complexity).

>The family comprises 500 species and while I have not looked at every one 
>I have at least endeavored to look at most, and eventually all, genera.

This makes a lot of outgroups. Do they all fit unambiguously outside your 
ingoup, i.e. are connected by a single branch with your ingroup? If not, 
you have likely made a mistake and some outgroup species may be members of 
the ingroup in fact. this is the interest of the "multiple outgroup" approach.
Unless you force the analysis to provide you with only one rooting, i.e. 
you boil down yourself, by hand (...by brain...) all these outgroups to a 
single ideal taxon with a unique series of character states for all 
characters. If yes, then you are implementing the "hypothetical ancestor" 
approach.
Nothing new, this is a classic, but long abandoned because of its too heavy 
burden of arbitrariness (you "invent" an ideal taxon fitting your guesses 
instead of simply dealing with the taxa at hand), but still possible with 
the programs (just introduce this fictitious taxon as "the" outgroup).

>One thing I have noticed said about morphological synapomorphies is that 
>they are either difficult to determine and/or that there is a lot of 
>parallelism.

The latter is certainly more the case of molecular sequence data (limited 
range of possible states).

>  I wonder whether the former is a product of degree of familiarity and/or 
> the ability to generalize a structure (something that I have found can be 
> a real challenge to understanding or recognizing comparability),

Homology decisions are easier for sequences when the alignment is non 
ambiguous (roughly: few and sparse changes in the sequences, so that 
changing sites are embedded in a non-ambiguous context of homologous 
features. This is like a change in a bone when the contiguous skeleton is 
identical... classic "connexion criterion" for molecules like for 
morphology... once again. Molecules have form, you know...).

>and the latter to the use of too many marginal characters (in the quest 
>for large numbers the inclusion of features that might be assumed to be 
>comparable rather than demonstrated).

Quest for "large numbers", or simply quest for using all relevant evidence?
Now, reliability is not "written on the data". Think twice before throwing 
away "garbage". Particularly when you throw away all molecular data.

>Again these are just observations from a personal point of view and I am 
>entirely open to thinking about these matters quite differently from 
>people who have undoubtedly many more years of detailed experience than I.

This is great news. But not at all a question of experience in my view. I'd 
say rather a question of logics, and of really going and fetching a couple 
of nice elementary notions about what the programs really do, and possibly 
being eager to try and refute one's personal views, rather than only eager 
to pretect them. Basic cladistic courses are free, and chewing on them is 
free, too.

Best,
Pierre

Pierre Deleporte
CNRS UMR 6552 - Station Biologique de Paimpont
F-35380 Paimpont   FRANCE
Téléphone : 02 99 61 81 66
Télécopie : 02 99 61 81 88