[Taxacom] Dark taxa: GenBank in a post-taxonomic world

Wed Apr 13 08:33:41 CDT 2011

Ah, but are we sampling fragments of a book or fragments of an index to books? Are the sequences used for phylogenetic analysis more of a guide to the literature or of who borrowed the book than a sampling of the results of evolution? 

If sampling of results of evolution, then we must expect convergence from selectional pressure for some sequences. The only way to distinguish between neutral expressed traits (for a range of habitats) and traits under selective pressure is to analyze the sequences with respect to actual selection. If so, then this is not randomly generated data that tracks gene history but a combination of biased and unbiased data. Any analysis of this data alone is tongue-in-cheek. 

There is a psychological mechanism in literature called "als ob", or suspension of disbelief, substituting something known to be contrary to logic or observation with something delightful or satisfying. 

ï»¿ 
* * * * * * * * * * * * 
Richard H. Zander 
Missouri Botanical Garden, PO Box 299, St. Louis, MO 63166-0299 USA 
Web sites: http://www.mobot.org/plantscience/resbot/ and http://www.mobot.org/plantscience/bfna/bfnamenu.htm
Modern Evolutionary Systematics Web site: http://www.mobot.org/plantscience/resbot/21EvSy.htm

-----Original Message-----
From: taxacom-bounces at mailman.nhm.ku.edu [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Curtis Clark
Sent: Tuesday, April 12, 2011 8:16 PM
To: taxacom at mailman.nhm.ku.edu
Subject: Re: [Taxacom] Dark taxa: GenBank in a post-taxonomic world

On 2011-04-12 06:53, Roderic Page wrote:
> This post may be of interest to TAXACOM readers. "Dark taxa: GenBank in a post-taxonomic world"
>
> http://iphylo.blogspot.com/2011/04/dark-taxa-genbank-in-post-taxonomic
> .html
>

Apologies if a commenter mentioned this and I missed it: I think what you need is a null model.

Imagine a finite set of books, and a web site for cataloging them based on selected passages. Early in the history of the site, contributors will be working with intact books, that they can identify to title. If /Dracula/ has already been investigated and the sequence between the primers of "in example" and "spade of the sexton" has already been characterized, I'm unlikely to contribute the same sequence from a different physical book. But there are still lots of books, and I can sequence another.

After time, the number of easily available intact books that have not been sequenced starts to diminish. But there are book fragments, which can also be sequenced. Let's say I find a fragment that I can only characterize as "British English turn of the 20th C 67534567" and develop sequences. A search might suggest that some of them are very similar to the corresponding sequences in /Dracula/, but if I assume that my fragment is part of that book, either I fail to submit the sequence, and take the chance that a novel book (pun serendipitous) would go uncharacterized, or else submit a new sequence for an existing book, which might imply variation that doesn't exist.

Much better to submit it under its fragment identifier, and let others with greater knowledge in the future sort it out.Over time, more of the submitted sequences will be from book fragments that can't be easily identified.

It seems to be that any finite set with exemplars in various states of identifiability will produce the same sort of curves as the ones you've characterized.

--
--
Curtis Clark