[Taxacom] Morphology vs Molecular

Bob Mesibov mesibov at southcom.com.au
Tue Aug 18 18:22:48 CDT 2009


Another thing to be clear about is the meaning of a molecular character when looking at raw sequence data, as opposed to looking at well-understood fragments, whole genes or other higher-category entities.

If you have a widespread sequence of, say, 20 bases with no known indels, you can be very confident that the characters are the *positions* in that sequence, 1-20. That is, the *positions* are homologous. At each position the character state is a base, so for DNA will be A, T, G or C.

If you have indels, which are very common in most of the widely used, longer sequences, two issues arise wrt identifying characters. The first is how you align sequences from different sources, because different multiple sequence alignment procedures (whether carried out first, or as part of direct optimisation) can give you different positional homologies. [It was interesting to see in that staphylinid paper recommended by Stephen Thorpe that the authors did separate analyses based on Clustal and MAFFT alignments. AFAIK this kind of catholic approach to alignment is rare. Most labs seem to pick their MSA method and stick with it.]

The second issue is how you treat gaps in your analysis after alignment. You can ignore them entirely, and this amounts to character weighting because an indel is an evolutionary novelty. Alternatively, you can treat a gap as a fifth character state. Someone more familiar with the molecular phylogeny literature than I am may be able to say how often analyses are done both ways, and the results compared.

'Ignore third codon' weighting for coding sequences can be avoided by doing an analysis of the amino acid sequence in its entirety. I'm not sure whether enough proteins are known yet to allow AA analyses to be useful at all taxonomic levels. There are also wonderful surprises lurking in the 'proteome'. I used to think (as a non-molecular taxonomist) that histone H3 was a very highly conserved nuclear protein with wonderful base-level variety. A few weeks ago I learned that H3 paralogy is ...um ... a problem.
-- 
Dr Robert Mesibov
Honorary Research Associate
Queen Victoria Museum and Art Gallery, and
School of Zoology, University of Tasmania
Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
(03) 64371195; 61 3 64371195
Website: http://www.qvmag.tas.gov.au/mesibov.html




More information about the Taxacom mailing list