[Taxacom] Sorry, but you are out-of-line
Doug Yanega
dyanega at ucr.edu
Mon Nov 15 12:30:08 CST 2010
Steve Gaimari wrote:
>You continually bring up GenBank as the model. There are differences,
>not the least of which is the relatively simple data structure. Also, I
>don't believe that GenBank will continue in its current conformation
>into perpetuity. They will upgrade their systems and migrate data and
>continue for as long as molecular biology is a critical field of study -
>I would say it will last a very very long time. But it will not be in a
>stagnant, *original* format into perpetuity. That is not something
>critical to molecular biology - access to the simple data is what is
>critical. However, it IS critical to the nomenclatural aspects of
>taxonomy - not just the simple data. Yes, there may be considerable time
>when a purely digital archive for taxonomy exists, and there will be
>continual upgrades to new technology for a while - maybe. But will
>taxonomy have the money and resources that the field of molecular
>biology has? Taxonomy sure hasn't demonstrated THAT, even with the
>world-recognized crisis in biodiversity. So I don't think setting up a
>system that will RELY on these resources into perpetuity is particularly
>forward-thinking. There is where the GenBank analogy falls apart, in my
>opinion.
You're raising two points here, and they're not
really linked. Your first point, if I read it
right, is that taxonomy *needs* a stagnant
original form on file somewhere, but it's not
clear what form you are referring to: is it the
*paper* form, or a digital representation OF that
paper form? If the former, I don't think it's
fair to say we NEED the paper hard copy once we
have created a securely-archived digital version.
It's *better* to have a hard copy, it's
*desirable* to have a hard copy, but I wouldn't
use the word "need" once there is a secure
digital version; at that point, the hard copy is
effectively superfluous, in the same way that
there is no longer a NEED for the metal meter
stick that was THE standard of reference for a
meter (at first there was a metal bar - the "hard
copy" - then in 1960 it became "1,650,763.73
wavelengths of the orange-red emission line in
the electromagnetic spectrum of the krypton-86
atom in a vacuum", and then in 1983 it became
"the length of the path travelled by light in
vacuum during a time interval of 1299792458 of a
second" - and I haven't heard of any physicists
objecting on the grounds that we may someday lose
the technology that allows us to measure
wavelengths or laser beams).
If you're referring to the digital version of the
hard copy not being maintained in perpetuity,
that's literally trivial; if GenBank asked
authors for PDFs of the papers in which their
sequences were cited, then do you honestly
believe that GenBank's archives would somehow be
inadequate to the task? Digital is digital as far
as storage, and as far as format, PDF is NOT
proprietary, and if the technology ever
"migrates", then the migration can be fully
automated. Remember, a centralized archive is NOT
stagnant; it isn't "storage" in the conventional
sense of something being set aside and left
untouched and then retrieved at some later,
indeterminate point, which is how *private*
archives work (and why private archives decay or
become obsolete - and why I've been harping on
private archives as irrelevant to the
discussion); that sort of "storage" would only
really apply to backups and mirrors - the main
archive, however, is *dynamic*, with all of its
elements up and running perpetually, constantly
updating, error-checking, and so forth - there
are no hiding places where some bit of data
(e.g., a PDF) can slip through the cracks and NOT
be converted to a different format when a
different format upgrade is initiated. Again, you
can't think of a centralized archive as "storage"
- the entire archive changes every second of
every day, and calling that "storage of data" is
like saying that a guy juggling three balls is
"storing" them. The bottom line is that any PDF
of a paper is just as secure, permament, and easy
to archive and migrate as "simple data".
Your second point is the one I *don't* have a
simple answer for, and as such, is of greater
general concern; "will taxonomy have the money
and resources that the field of molecular biology
has?"
To some extent, I think we may be selling our
commodity short a bit there; when you put ALL of
taxonomy together, and consider how essential
taxonomy is to the rest of the scientific
community, it's not of trivial importance. Just
one observation alone can suffice to make the
general point: none of the data in GenBank are
legitimately valuable to anyone if they are not
linked to an organism, and that link is taxonomy.
True, the bulk of GenBank is of common organisms
whose taxonomy is absolutely stable (like "Homo
sapiens"), but there's a lot of stuff in there
for which taxonomy is crucial. Another part of
this is that we have never gotten together AS a
community and said "We are unanimous in our
desire to have a permanent centralized archive -
will you fund it?" - how can we expect or imagine
being given money if we haven't shown we can work
together or agree on anything? Consider that
there *has* been money and resources given to
taxonomy (in the broad sense) - repeatedly - to a
number of different iniatives, each with slightly
different goals and approaches. Is it possible
that overlap and/or competition between these
initiatives has created an environment such that
nothing that even *smells* like a cataloguing
effort will attract new funding (because "so-n-so
is already doing that")? The bottom line here is
that your second point deals far, far more with
politics than anything else, and - as such - is
less about logic or practicality, and accordingly
almost completely unpredictable. There are no
easy, obvious answers, aside from this one: we
won't have a centralized archive if we abandon
the idea without even trying. THIS is the topic
we most badly need to be discussing, instead of
the technical stuff. I agree that the analogy to
GenBank fails in *this* matter - the *politics*
behind it - but previous iterations of the
discussion, including my original statement of
the analogy - were in reference to the
*technical* side, which is what people were
worrying about, and the analogy still holds there.
As for a solution to the political dilemma, one
idea I have raised before, to limited and at best
half-hearted response, is that - if creating our
own GenBank-like archive seems genuinely beyond
our means (either practically or politically) -
we might consider riding on GenBank's coattails;
approach them and see if they would be willing to
incorporate taxonomic data in their archives.
Then, in the best-case scenario, the only funding
*we* would need is for the process of getting the
data uploaded, and perhaps designing a
taxonomist-friendly interface; the actual
infrastructure (otherwise a significant expense)
would be GenBank's, and already in place. As
Donat has already demonstrated, our data are NOT
very different from their data, as seen through
the proverbial eyes of a computer. They already
have several orders of magnitude more sequences
archived than there are nomenclatural acts in all
of recorded history; we would not make much of a
dent in their dataspace.
Sincerely,
--
Doug Yanega Dept. of Entomology Entomology Research Museum
Univ. of California, Riverside, CA 92521-0314 skype: dyanega
phone: (951) 827-4315 (standard disclaimer: opinions are mine, not UCR's)
http://cache.ucr.edu/~heraty/yanega.html
"There are some enterprises in which a careful disorderliness
is the true method" - Herman Melville, Moby Dick, Chap. 82
More information about the Taxacom
mailing list