[Taxacom] cladistics

John Grehan jgrehan at sciencebuff.org
Tue Aug 23 07:11:14 CDT 2011


> Maximum Likelihood (ML) phylogenetic reconstructions
>  are not clustering algorithms in this sense.

I note the qualification of "in this sense"

I found it interesting to see that on one hand I was castigated over my supposed ignorance about the term 'clustering' only to then see that there was disagreement over the term clustering among those who are experts on phylogenetic algorithms.

John Grehan

-----Original Message-----
From: taxacom-bounces at mailman.nhm.ku.edu [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Sergio Vargas
Sent: Tuesday, August 23, 2011 5:22 AM
To: Herbert Jacobson
Cc: taxacom at mailman.nhm.ku.edu
Subject: Re: [Taxacom] cladistics

pffff,

I guess this will be another never ending debate;-)

 From the email thread on parsimony, orangs et al. etc. I think we were 
using clustering in the sense of "hierarchical clustering" as if one 
calculates a pair-wise distance matrix and uses this matrix to draw a 
tree using an agglomerative or divisive algorithms. I think Maximum 
parsimony (MP) and Maximum Likelihood (ML) phylogenetic reconstructions 
are not clustering algorithms in this sense.

Using a broader definition from wikipedia: clustering = "the assignment 
of a set of observation into subsets=clusters". I'm still not sure one 
could say MP or ML are actually doing this. One could argue, perhaps, 
you are assigning taxa to clades and MP and ML are some form of 
unsupervised learning... not sure too.

Enlightenment from other taxacomers, pointers to relevant literature, 
etc. much appreciated.

sergio

On 8/23/11 4:24 AM, Herbert Jacobson wrote:
> I don't think clustering is "...grouping by a data matix." Quite the 
> opposite, it grouping by the "coefficient matrix" which is the result 
> of some sort of data matrix manipulation.
>
> Herb
>
> > Date: Sat, 20 Aug 2011 12:36:30 -0500
> > From: Richard.Zander at mobot.org
> > To: morris.bob at gmail.com; sevragorgia at gmail.com
> > CC: taxacom at mailman.nhm.ku.edu
> > Subject: Re: [Taxacom] cladistics (was: clique analysis in textbooks)
> >
> > I think taxacomers who lack decisive training in phenetic analysis, 
> which is most of us, figure clustering is grouping by a data matrix 
> that compares one taxon and one variable and then some similarity 
> algorithm. Thus, Sergio is correct that an instant similarity or 
> distance tree is different from a parsimony tree, in terms of what we 
> have been told: i.e. that phenetics and parsimony are different.
> >
> > On the other hand, I took a tutorial course (3 days) in clustering 
> techniques (didn't learn much, of course) at a meeting of the 
> Classification Socity from the then president of the Society and 
> Pierre Legendre. I asked, ahem, if parsimony was a clustering 
> technique. The two glanced at each other furtively, then opined that 
> indeed parsimony is a clustering technique. Thus, authority says it is.
> >
> > Yes, parsimony does calculate a bunch of distance trees and selects 
> recursively (I think) the shortest tree because it is NP-complete 
> (NP-hard), i.e., can't complete an exact solution in polynomial time. 
> So...does the fact that we have to do heuristic sampling to get any 
> sort of tree make parsimony not clustering? I think this is what this 
> thread is about.
> >
> > Surely the product is a distance tree based on shortest 
> transformation set?
> >
> >
> >
> > * * * * * * * * * * * *
> > Richard H. Zander
> > Missouri Botanical Garden, PO Box 299, St. Louis, MO 63166-0299 USA
> > Web sites: http://www.mobot.org/plantscience/resbot/ and 
> http://www.mobot.org/plantscience/bfna/bfnamenu.htm
> > Modern Evolutionary Systematics Web site: 
> http://www.mobot.org/plantscience/resbot/21EvSy.htm
> >
> > -----Original Message-----
> > From: taxacom-bounces at mailman.nhm.ku.edu 
> [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Bob Morris
> > Sent: Friday, August 19, 2011 10:53 PM
> > To: Sergio Vargas
> > Cc: taxacom at mailman.nhm.ku.edu
> > Subject: Re: [Taxacom] cladistics (was: clique analysis in textbooks)
> >
> > On Fri, Aug 19, 2011 at 2:32 PM, Sergio Vargas 
> <sevragorgia at gmail.com> wrote:
> > "...because clustering can be done (computationally) efficiently
> > whereas searching for an optimal tree using phylogenetic methods
> > cannot."
> >
> > It's fair enough that some or even all biologists might have a usage
> > of "clustering" that meet all of your explanation, and perhaps even
> > that this should be agreed to by all of the readership of taxacom. I
> > wouldn't know. But in statistical pattern recognition and datamining,
> > not everything called clustering can be done computationally
> > efficiently. Many techniques those disciplines call clustering are
> > intractable in the sense that they are NP-hard. Informally, this means
> > that (with presently understood computational complexity theory),
> > they fundamentally scale at least exponentially with size of the data
> > and no algorithm can circumvent that, just as for optimal tree
> > induction problems. So I can only understand your text as meaning
> > "...because clustering as meant by all practicing phylogeneticists can
> > be done (computationally) efficiently...", and that is why you are
> > prepared to subsequently say that the rest of your explanation "[...]
> > is so basic I cannot believe I am explaining it".
> >
> > I do wonder a little whether in fact all practicing phylogeneticist
> > readers of taxacom understand by "clustering" only tractable
> > algorithms.
> >
> > Bob Morris
> >
> > Robert A. Morris
> > Emeritus Professor  of Computer Science
> > UMASS-Boston
> > 100 Morrissey Blvd
> > Boston, MA 02125-3390
> > IT Staff
> > Filtered Push Project
> > Harvard University Herbaria
> >
> >
> >
> > email: morris.bob at gmail.com
> > web: http://efg.cs.umb.edu/
> > web: http://etaxonomy.org/mw/FilteredPush
> > http://www.cs.umb.edu/~ram
> >
> >
> >
> > On Fri, Aug 19, 2011 at 2:32 PM, Sergio Vargas 
> <sevragorgia at gmail.com> wrote:
> > > Hi,
> > >
> > > >Clustering is clustering is clustering. Group some things 
> together and
> > > you are clustering - however it is done.
> > >
> > > no you are not. Grouping is not clustering, there are many ways to 
> group
> > > things together not involving clustering. Maximum parsimony, maximum
> > > likelihood and bayesian analysis are not clustering. It is simply
> > > incorrect to call to these methods clustering. When you run either of
> > > the above analyses you are not clustering, despite the result being
> > > something similar to a cluster. If you could reduce phylogenetic
> > > inference to clustering everything would be so easy (computationally
> > > speaking) because clustering can be done (computationally) efficiently
> > > whereas searching for an optimal tree using phylogenetic methods 
> cannot.
> > > Taxa are only "clustered" (randomly or sequentially) together to build
> > > the first tree, afterwards entire topologies are evaluated, taxa 
> are not
> > > clustered. This is so basic I cannot believe I am explaining it.
> > >
> > > sergio
> > >
> > > --
> > > Sergio Vargas R., M.Sc.
> > > Dept. of Earth&  Environmental Sciences
> > > Palaeontology&  Geobiology
> > > Ludwig-Maximilians-Universität München
> > > Richard-Wagner-Str. 10
> > > 80333 München
> > > Germany
> > > tel. +49 89 2180 17929
> > > s.vargas at lrz.uni-muenchen.de
> > > sevra at marinemolecularevolution.org
> > >
> > > check my webpage:
> > > http://www.marinemolecularevolution.org
> > >
> > > check my research ID:
> > > http://www.researcherid.com/rid/A-5678-2011
> > >
> > >
> > > _______________________________________________
> > >
> > > Taxacom Mailing List
> > > Taxacom at mailman.nhm.ku.edu
> > > http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> > >
> > > The Taxacom archive going back to 1992 may be searched with either 
> of these methods:
> > >
> > > (1) by visiting http://taxacom.markmail.org
> > >
> > > (2) a Google search specified as: 
>  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
> > >
> >
> >
> >
> > --
> > Robert A. Morris
> >
> > Emeritus Professor  of Computer Science
> > UMASS-Boston
> > 100 Morrissey Blvd
> > Boston, MA 02125-3390
> > IT Staff
> > Filtered Push Project
> > Department of Organismal and Evolutionary Biology
> > Harvard University
> >
> >
> > email: morris.bob at gmail.com
> > web: http://efg.cs.umb.edu/
> > web: http://etaxonomy.org/mw/FilteredPush
> > http://www.cs.umb.edu/~ram
> > phone (+1) 857 222 7992 (mobile)
> >
> > _______________________________________________
> >
> > Taxacom Mailing List
> > Taxacom at mailman.nhm.ku.edu
> > http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> >
> > The Taxacom archive going back to 1992 may be searched with either 
> of these methods:
> >
> > (1) by visiting http://taxacom.markmail.org
> >
> > (2) a Google search specified as: 
> site:mailman.nhm.ku.edu/pipermail/taxacom your search terms here
> >
> > _______________________________________________
> >
> > Taxacom Mailing List
> > Taxacom at mailman.nhm.ku.edu
> > http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> >
> > The Taxacom archive going back to 1992 may be searched with either 
> of these methods:
> >
> > (1) by visiting http://taxacom.markmail.org
> >
> > (2) a Google search specified as: 
> site:mailman.nhm.ku.edu/pipermail/taxacom your search terms here

-- 
Sergio Vargas R., M.Sc.
Dept. of Earth&  Environmental Sciences
Palaeontology&  Geobiology
Ludwig-Maximilians-Universität München
Richard-Wagner-Str. 10
80333 München
Germany
tel. +49 89 2180 17929
s.vargas at lrz.uni-muenchen.de
sevra at marinemolecularevolution.org

check my webpage:
http://www.marinemolecularevolution.org

check my research ID:
http://www.researcherid.com/rid/A-5678-2011

_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these methods:

(1) by visiting http://taxacom.markmail.org

(2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here




More information about the Taxacom mailing list