[Taxacom] Morphology vs Molecular

Thu Aug 20 00:32:38 CDT 2009

Actually one of the "classical" alignment programmes Clustal started  
off using UPGMA for the initial alignment, while more recent versions  
were "improved":

Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the
sensitivity of progressive multiple sequence alignment through sequence
weighting, positions-specific gap penalties and weight matrix choice.  Nucleic
Acids Research, 22:4673-4680.

Looks like weighting has entered the arena.

Brian

Quoting John Grehan <jgrehan at sciencebuff.org>:

>
> Alignment seems to be a procedure that imposes overall similarity where
> the homology of an individual base is determined as a byproduct of the
> overall compromise between the theorized significance of the number of
> gaps vs number of substitutions. I think this is one aspect of molecular
> analysis where primitive retention cannot be empirically excluded from
> the data.
>
> John Grehan
>
>> -----Original Message-----
>> From: taxacom-bounces at mailman.nhm.ku.edu [mailto:taxacom-
>> bounces at mailman.nhm.ku.edu] On Behalf Of Bob Mesibov
>> Sent: Tuesday, August 18, 2009 7:23 PM
>> To: TAXACOM
>> Cc: Richard Zander
>> Subject: Re: [Taxacom] Morphology vs Molecular
>>
>> Another thing to be clear about is the meaning of a molecular
> character
>> when looking at raw sequence data, as opposed to looking at well-
>> understood fragments, whole genes or other higher-category entities.
>>
>> If you have a widespread sequence of, say, 20 bases with no known
> indels,
>> you can be very confident that the characters are the *positions* in
> that
>> sequence, 1-20. That is, the *positions* are homologous. At each
> position
>> the character state is a base, so for DNA will be A, T, G or C.
>>
>> If you have indels, which are very common in most of the widely used,
>> longer sequences, two issues arise wrt identifying characters. The
> first
>> is how you align sequences from different sources, because different
>> multiple sequence alignment procedures (whether carried out first, or
> as
>> part of direct optimisation) can give you different positional
> homologies.
>> [It was interesting to see in that staphylinid paper recommended by
>> Stephen Thorpe that the authors did separate analyses based on Clustal
> and
>> MAFFT alignments. AFAIK this kind of catholic approach to alignment is
>> rare. Most labs seem to pick their MSA method and stick with it.]
>>
>> The second issue is how you treat gaps in your analysis after
> alignment.
>> You can ignore them entirely, and this amounts to character weighting
>> because an indel is an evolutionary novelty. Alternatively, you can
> treat
>> a gap as a fifth character state. Someone more familiar with the
> molecular
>> phylogeny literature than I am may be able to say how often analyses
> are
>> done both ways, and the results compared.
>>
>> 'Ignore third codon' weighting for coding sequences can be avoided by
>> doing an analysis of the amino acid sequence in its entirety. I'm not
> sure
>> whether enough proteins are known yet to allow AA analyses to be
> useful at
>> all taxonomic levels. There are also wonderful surprises lurking in
> the
>> 'proteome'. I used to think (as a non-molecular taxonomist) that
> histone
>> H3 was a very highly conserved nuclear protein with wonderful
> base-level
>> variety. A few weeks ago I learned that H3 paralogy is ...um ... a
>> problem.
>> --
>> Dr Robert Mesibov
>> Honorary Research Associate
>> Queen Victoria Museum and Art Gallery, and
>> School of Zoology, University of Tasmania
>> Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
>> (03) 64371195; 61 3 64371195
>> Website: http://www.qvmag.tas.gov.au/mesibov.html
>>
>> _______________________________________________
>>
>> Taxacom Mailing List
>> Taxacom at mailman.nhm.ku.edu
>> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>>
>> The Taxacom archive going back to 1992 may be searched with either of
>> these methods:
>>
>> (1) http://taxacom.markmail.org
>>
>> Or (2) a Google search specified as:
>> site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either  
> of these methods:
>
> (1) http://taxacom.markmail.org
>
> Or (2) a Google search specified as:   
> site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>

Dr.B.J.Tindall
DSMZ-Deutsche Sammlung von Mikro-
organismen und Zellkulturen GmbH
Inhoffenstraße 7B
38124 Braunschweig
Germany
Tel. ++49 531-2616-224
Fax  ++49 531-2616-418
http://www.dsmz.de
Director: Prof. Dr. Erko Stackebrandt
Local court: Braunschweig HRB 2570
Chairman of the management board: MR Dr. Axel Kollatschny

DSMZ - A member of the Leibniz Association (WGL)