GenBank & Taxonomical Nomenclature/identification

Peter Rauch anamaria at GRINNELL.BERKELEY.EDU
Fri Jul 28 11:32:31 CDT 2000


On Fri, 28 Jul 2000, Carol Hotton wrote:
> Obvious errors in species identification (e.g. bacterial or mouse
> sequence submitted as human) are generally caught upon submission.
> More subtle errors, as Detlef states, are difficult for us to catch.
> Hence the desirability of voucher specimens.

Yes. See my immediate previous posting, replying to Scott
Federhen comment about vouchers.

> If (for example) a species is misidentified in a sequence record after
> it is deposited in GenBank, the record has to be updated manually by
> the GenBank indexing staff.  We generally tell the indexer changing
> the record to insert a note in the sequence record: /note="submitted
> as <oldname>.  But unfortunately this has not been done consistently,
> and many sequence records, especially older submissions, often lack
> this information.
>
> Hope this clarifies things a bit...

Carol,

It does clarify things, but then only raised the very
disconcerting question about why this system is designed so
"light-weight" (to be kind), for handling this extremely
important concept. If data in genbank are _used_, and used* with
the wrong identification (which ids in genbank are later
corrected), then the earlier data and all uses of it are not only
invalidated but potentially extremely costly --maybe even
dangerous.

* "used", with all the implications that this means, such as
making policy decisions, medical decisions, research programmatic
decisions, environmental protection decisions, etc., is what the
game is all about. If we are asked to use genbank data, and then
not able to go back and learn easily and reliably that we based
our decisions on faulty data, what is the point of genbank's
service?

Carry fully-annotated histories of changes in a record's data
fields is very old technology. To not have implemented this
concept fully in regards to identifications of the very data one
is using can not be for lack of technological knowhow. It must
have been for lack of understanding and appreciation of the
fundamental uses and consequences of using data!

Describing these determination changes (updates) as resulting
from "subtle" or "obvious" cases just does not begin to
characterize the consequences that those changes can have. Nor
does it suggest whether obvious or subtle cases are the more
common, how common, nor does it suggest which of the two "kinds"
of data are more "used".

A well-designed database/information system will not record these
changes --if it records them at all!-- as "remarks", "comments"
or other quasi-anectodal notes. It will design the history-of-change
subsystem as a fully functional system of record-keeping,
documented, searchable, signed and sealed by authority at the
level of detail required for users of the sequence data to
_always_ know where they (their analyses and uses) stand.

Am I missing something here?

To paraphrase Chevron ads, "Do people care? They'd better care!"

Peter




More information about the Taxacom mailing list