[Taxacom] Dark taxa: GenBank in a post-taxonomic world
Karl Magnacca
kmagnacca at wesleyan.edu
Thu Apr 14 14:27:27 CDT 2011
> So I guess the issue is what are the processes involved. If we
> assume:
>
> 1. a large set of taxa
> 2. a finite set of taxonomists who have identified a subset of those
> taxa
> 3. the rate of species description is lower than the rate at which
> "species"are added to GenBank
> 4. the order in which taxa are sequenced correlates with how well
> known they are (or how easy they are to identify)
>
> then it seems reasonable to expect curves like the ones in the blog
> post. I take it your argument is we need not invoke a decline in the
> output of taxonomy, nor a cavalier attitude towards taxonomy by
> sequencing jockeys, it's just a consequence of these assumptions.
The thing that seems to have been missed in this entire discussion
is an item pointed out by you in the original blog post, that the
*vast* majority of these dark taxa are COI sequences deposited from
BOLD. Moreover, I would be willing to bet that the largest part of
at least the invertebrates comes from one particular place that has
an especially cavalier attitude towards taxonomy and a very high
output. So I don't think it's so much an indicator of a decline of
taxonomy, as showing the impact that relatively few people with a
high productivity can have.
Also, it's worth pointing out that if you take any group of
broadly-sampled invertebrate GenBank sequences for a particular
gene, you will get misidentifications. This, IMO, is a much worse
problem even if it's smaller in volume, because it's actively
misleading; putting in things without IDs is just noise that fills
your search results.
Karl
=====================
Karl Magnacca
Postdoctoral Researcher
University of Hawaii-Hilo
More information about the Taxacom
mailing list