[Taxacom] DNA homologies

Wed Sep 27 10:26:12 CDT 2006

A t15:13 26/09/2006 -0400, John Grehan wrote:
>If anyone has any comments on the validity or
>otherwise of my representation of the issue I would be most interested
>as it will help me finalize my discussion of the problem - even if we
>may need to agree to disagree on some aspects.

a balanced reflexion is needed for both molecular and morphological 
investigations,
and molecular analyses should not be presented a distorted way,
hence a few suggestions below :

>This correlation between similarity and
>relationship fails to recognize that the two concepts are not
>necessarily identical.

= classic notion of the "pitfall of clustering on the basis of 
simplesiomorphies", an objection addressed to phenetic analyses, which is 
applicable to molecular and morphological data analysis as well,
hence this assertion is WRONG if applied to ALL molecular analyses, because 
parsimony analyses of molecular data avoid this problem

>Unlike morphological characters, the homologies between DNA bases of 
>different taxa is a theoretical model rather than empirical observation 
>since comparisons require matching bases between
>different lengths of DNA.

This is a particularly dangerous illusion. There is nothing like "pure 
empirical observation" in comparative morphology - there is simply no 
"theory-free empirical observation" in science. A morphological homology 
statement is necessarily loaded with theory (see below).

>This match is accomplished by shuffling the DNA

NOT for nuclear DNA: no gaps, no 'shuffleable' data - there are much more 
"gaps" in morphological data than in nuclear DNA sequences, see below

>to produce the best overall match by creating artificial DNA 'gaps' (ref).

yes but, to be fair: when "aligning" tetrapod squeletons, morphologists 
introduce "artificial squeletal gaps" texactly the same way, e.g. under the 
form of "fantom fingers" in order to account for the existence of 
five-toed, four toed, three-toed, two-toed, one-toed, and even toeless, if 
not merely legless, tetrapods.
nobody can decently ignore such "squeleton shuffling and morphological gap 
coding", unless ignorant, or ideologically biased against molecular analyses

>The result is a data set representing overall

[overall similarity I presume?]

>of DNA

if one considers the "alignment" in the data matrix, be it molecular or 
morphological, one must face the same kind of "problems of alignment" (= 
primary homology, see above)
[recall: "overall similarity" in data analysis (phenetics) is another question]

it is strange that you don't mention obvious problems of difference in 
structure of the data, like the limitation of number of states in molecular 
sequence characters (increased statistical risk of "saturation" compared 
with morphological traits with more numerous states or slower evolution)

otherwise you keep on reiterating indefinitely the same misconceptions 
about how morphological and molecular data are analysed (= obviously the 
same way in fact, if you apply the same methods, e.g. maximum parsimony)

>rather than uniquely derived character states.

you'll be better understoood if you talk of "compatibility analysis", a 
variant of maximum parsimony analysis - otherwise perfectly appliable to 
molecular data of course [even if generally considered obsolete for 
morphological data as well, but this is not my point]

your whole argument would be reinforced if you effectively analysed 
molecular data your own preferred way: compatibility analysis of molecular 
data is just one click away of standard parsimony analysis, and if you 
don't try such a simple approach, it really looks like you are reluctant to 
analyse any molecular data set at all

>Other theoretical assumptions include a continuous clock like divergence 
>of DNA,

this is utterly WRONG for maximum parsimony analysis of molecular data, 
i.e. under the "NO COMMON MECHANISM" model of character evolution - can't 
be farther away from molecular clocks (strictly no regular common 
evolutionary process is involved - there are potential problems with long 
branch attraction / repulsion, but this - strangely enough - is not your point)

>  the retention of primitive sequences in primitive groups for cladistic
>analysis despite the clock theory,

what do you mean ? "primitive group" is hardly a "cladistic" (parsimony) 
notion, to the contrary "cladists" severely criticised the misleading 
notion of "primitive taxon",
while the classical cladistic "ougroup rooting" involves no inference of a 
"primitive sequence" at all: just place the root on the branch connecting 
outgroups with ingroup (see any basic course)
hence standard "cladistic" (parsimony) analysis of molecular data is 
obviously indemn of such a criticism

>and random mutations in non-coding regions that somehow retain a 
>non-random pattern correlated with
>speciation..

again, can you explain what do you mean ? is it you, or is it still 
"Schwarz talking" ? but "Schwarz wrote" is not an argument in itself... 
(also farther below "Schwarz suggests"...) ; not giving your own opinion 
and supporting argument weakens your paper (is it really intended as a mere 
"Schwarz dixit" paper? hardly worth publication at all, readers and editors 
are expecting at least some original reflexion or data...)

>There may be theoretical explanations as to why these
>assumptions can be accepted, but that is the point, the explanations are
>theoretical, and any theoretical model is open to question.

of course, provided that you include the generally implicit (but necessary) 
theoretical assumptions underlying morphological phylogenetic analyses

"relevant data" are not facts, they are theory-loaded hypotheses (a classic 
in modern epistemology)

this illusion of theory-free scientific observations when it comes to 
morphology seems to be underlying your strongly biased charge against 
molecular analyses - you cannot change your philosophy when you change data 
sets

>The often stated claim that DNA and other molecular studies get the same
>answer is also problematic. Such studies may 'consistently' support the
>chimpanzee relationship while also placing gibbons closer to humans than
>orangutans. Even the often cited similarity of human and Africana ape
>albumens confounds the chimpanzee theory by showing greater similarity
>between humans and gorillas.

the general answer to inconsistencies (inside molecular analyses as well as 
between molecular and other approaches) is effectively; combined analysis 
of all relevant evidence
if this is your point I can't agree more, but you'd better clean your 
argument from a series of utter misconceptions, like your recurrent charge 
that there are no "cladistic" (parsimony) molecular analyses - because 
there ARE, you know...

>He argues that where molecular and morphological
>data disagree, both must be re-examined carefully.

no objection - just apply this to morphological analyses as well, including 
tracking implicit "models" / assumptions, i.e. rejecting the illusion of 
"theory-free" morphological analyses

>DNA sequences in different parts of the genome
>may be combined together in the formation of novel biological features
>that would not be detectable when chopping and matching bases in their
>sequential linear positions.

possibly, and this could be an intresting argument in favor of some 
independence between morphological data and molecular sequence data, hence 
in favor of a combining "and-and", rather than "or-or" approach
- but this can hardly impede some phylogenetic signal to be carried by 
sequence data, which is a completely different matter

you're also talking of "passing the censors", I suggest that your first 
concern should be "passing the pertinent reviewers" you will likely 
encounter in any good journal
I must suggest a lot of corrections and complementary work to this first 
draft, a key point being that you effectively analyse molecular data your 
pet way, i.e. compatibility parsimony, exactly like you analyse morphology

hope I'm not "censoring" in any way - I really don't care a bit about the 
relation Homo-Pongo-Pan in itself, just discussing concepts and methods

Best,
Pierre

Pierre Deleporte
CNRS UMR 6552 - Station Biologique de Paimpont
F-35380 Paimpont   FRANCE
Téléphone : 02 99 61 81 63
Télécopie : 02 99 61 81 88