[Taxacom] was contamination

Mon Apr 4 16:43:28 CDT 2011

I am rather bad at explaining myself so bear with me:

JG wrote: My understanding is that one is not dealing with character states, but
four different characters that replace each other so one really does not
know what came before. Even with coding genes the genes are not
necessarily the same length so one has to invent the homologies even
before theorizing the replacement. And in practice the outgroup is often
so limited in taxonomic representation as to almost be meaningless. 

To simplify the argument let us discuss only protein-coding sequences. The location of a nucleotide is the character and the state is the particular nucleotide you find there. Similarly with proteins and practically any character you can think of (segment 3 is defined in relation to segments 2 and 4, unless you have developemental information which is quite rare and even then the homology is relational). Length differences can occur either due to insertions (often non-coding) or, more rarely, actual deletions. I say rarely because the encoded still has to function and the mutation may be, and most times is, deleterious and therefore selected against. But still the rest of the gene/protein characters are there so we can determine what position and which characters have been deleted/inserted, so one need not "invent" anything.
Regarding the outgroup I agree, some phylogenies have better outgroup sampling than others, regardless of the kind of data. Nevertheless the outgroup will often just be a (small) subset of the immediate group or groups that are assumed to be sister to the ingroup. Unless you work with a taxonomically impoverished lineage, in which case you can (and should) add everything you can get.

Phylogenetics maps just about every scientific phenomenon associated
with evolution onto a phylogenetic tree, the latter treated as a
fundamental pattern in nature. What the pattern actually is is the
relationships of present-day exemplars as a result of evolution. These
are facts (the equivalent of distance measures) just as the relative
motions of heavenly bodies are facts. More exactitude in determining
present-day patterns of present-day exemplars tells us no more about the
past evolutionary processes involved in determining those relationships,
just as learning more about exactly when people were born lets us
determine their fate better through astrology. 

The phylogeny is the hypothesis (best guess) used to explain the facts, the "taxa". The taxa contain and are the data, for every feature in an organism contains the traces of its evolution and can therefore be used to infer its kinship. Thus I find the notion that employing extant taxa is problematic rather strange. Extant taxa contain snipets of data (extinction having eliminated the vast majority) with which we try to explain the taxa via the simplest hypothesis. You can argue that extant taxa are limiting but that is the data you have to work with.

ZANDER
Molecular data support ALL trees, just some more than others. They cannot falsify morphological results. If morphological results are so certain they falsify the probabilistic molecular tree, then a suboptimal molecular tree is required so both fit theory (unless you are theory-free, in which case you intuit that DNA has more manna).
GREHAN
Since falsification is dependant upon some pre-determined criterion, the support for molecular data cannot falsify morphological results simply through incongruence. In this respect I am in agreement.
ZANDER
Yes, I opined that molecular results cannot falsify morphological results. Yeah, I made that up. But, I don't hear anyone contradicting me with a good example of a falsification, so I must be right. : ) 

Any data can support all trees, we just care about the tree that is supported by the most characters (or the hypothesis that can best explain the distribution of character states). As for falsification, the only criterion would be knowledge of the truth, and we very rarely have that, so the falsification you talk about is impossible. Phylogenies are just best estimates, educated guesses, etc. One hypothesis may be better at explaining the available facts and become the de facto preferred hypothesis, but this applies to any data-source. And additional data may result in a new preferred hypothesis.