Fwd: Re: [TAXACOM] genetic vs morphological trace o f phylogeny

Mon Apr 12 18:08:35 CDT 2004

Richard,

> The Batzler study is really, really impressive, but there are some
lacunae:
>
> 1. Sequence alignment is a consensus. Parsimony is often used during
> sequencing when alignment by eye is difficult. Gap costs are often
involved
> and they are rather arbitrary. Is the final cladogram robust to different,
> less parsimonious alignments, or different but reasonable gap costs? No
> info.

Okay, maybe I should clarify that the Batzer paper's inferences on phylogeny
have nothing to do with sequences.  Their cladograms are built on binary
character data, where the presence/absence of each studied Alu element is
scored as 1/0.  There is nothing to align: using primers designed against
flanking sequence from known insertion sites means that each experiment
generates one--and one only--observation on each character state, for each
separate character.  There may, on the other hand, be some conflation of
'absence' with essentially a failed experiment if primer sites are variable
among species (i.e., if you get an 'absent' result, it could be because the
experiment 'failed' and so the character state should be more appropriately
labelled 'missing'.  This is the problem that having complete genomes would
avoid.)  Nevertheless, I don't see how that problem could 'load the coin' in
one direction or the other (with respect to the closest relative question,
at least).

Secondly, I'm not sure what you mean by "parsimony is used during sequencing
when alignment by eye is difficult".  Sequencing and alignment of homologous
sequences are separate steps, corresponding to data generation and
preparation for analysis respectively.  I agree that alignment can be
arbitrary because of indels--especially in highly divergent comparisons--and
phylogeny robustness should be tested with respect to alternative
alignments, unless the statistically preferable procedure is used which
simulataneously infers alignment and phylogeny in a Bayesian framework.
That approach has the advantage of specifically modelling insertion and
deletion as a Markovian process, and using that model to sample likelihoods
under many possible alignments.

> 2. Dollo parsimony is reasonable given the sequence over time of
insertions.
> But that is somewhat circular and is allowable only if a different
sequence
> over time is unreasonable. How to demonstrate that?
> to non-Dollo parsimony? Are Alu insertions definitely random in place of
> insertion in a sequence and can they be lost and regained in place?
> 3. Are Alu traits independent? Certainly they can be identified with
unique
SNIP

I agree that these questions are fundamental, and not completely answered.
We need to know more about the molecular biological basis for insertion (and
loss), which will only partially be resolved by full genome sequences.  We
need experiments, too.

> The phylogenetic coin is doubtless loaded, but would pay $5.00 or bet your
> science on the results? Remember that to attain a 95% or 99% confidence
> interval, you only have 5% or 1% wriggle room for doubt.

I don't think the coin is doublessly loaded--biased data and loaded data are
not the same thing.  All data are biased by the way they are collected, from
measurement error, chance events, and so on...but those errors are not
necessarily misleading.

- Jason