Maintaining databases

Tue Nov 28 12:32:33 CST 1995

Gary Rosenberg <ROSENBERG at SAY.ACNATSCI.ORG> wrote:
>
>Maintaining and updating a well-designed database should not be
>particularly labor-intensive.  The complaints about this problem
>indicate that either some collections databases are not well
>designed, or that people do not know how to use them to full
>advantage.  The case under discussion, updating thousands of
>identifications to the subfamily or genus level, should be easy
>in a well designed system.

I would agree...it *should*. Standard practice, though, is that *every*
specimen gets a unique identifier, which poses a problem when the material
is sorted above the level of species. Yes, if it were a matter of species X
being moved into a different genus, then one simple command should be all
that is needed, but when you start out with 10,000 inventoried but unsorted
specimens of tribe Z which later are IDed as genera A, B, and C, there is
no one global command that can be issued to make the appropriate changes:

>If 1000 specimens are identified as genus A,
>that means 1000 records need an identical change.  Given a list
>of the catalogue numbers (or other unique ID numbers), it is
>possible in most database programs to make the needed changes to
>all affected records with a single set of commands.  If the
>system at your institution requires that such changes be made one
>by one, you need a different system.

This is only simple and easy if those 1000 specimens happen, by some
miracle, to have their numbers *in sequence*. Otherwise, someone will have
to sit down, and *make* that list of 1000 numbers, then type the commands
(a process that can, incidentally, be just as fast as trying to scan
barcodes, so there isn't necessarily much time-saving in having barcodes,
either) - it's no longer a trivial enterprise at that point. I can't think
of any systems that can avoid this problem...if there is no shared property
of those 1000 specimens *already* in the database which distinguishes them
from all other specimens, then there is no simple command which can make a
global change on those records by themselves.

>I think NSF should support four areas:
>1) getting collections started with good database systems,
>2) upgrading inefficient database systems,
>3) data entry for collections with well designed databases,
>4) travel of systematists to other institutions to help with
>   identifications.
>
>I do not think NSF should support maintenance of existing
>systems, which is an institutional responsibility.

In a way, though, #2 on your list would accomplish the same thing, because
it is almost a given that with technological turnover as rapid as it is,
MOST databases will be relatively "inefficient" by the time they're three
or four years old (would you care to wager that three years from now there
will not be software which is superior to anything presently available??).
If upgrades are that frequent, then there's hardly a practical difference
between "upgrading" and "maintenance". #3 also seems to be one of the
things most folks would consider part of maintenance, and #1 is tricky,
since there is no formal standard in place, so "good" is purely subjective.
Sincerely,

Doug Yanega       Illinois Natural History Survey, 607 E. Peabody Dr.
Champaign, IL 61820 USA      phone (217) 244-6817, fax (217) 333-4949
 affiliate, Univ. of Illinois Urbana-Champaign, Dept. of Entomology
  "There are some enterprises in which a careful disorderliness
        is the true method" - Herman Melville, Moby Dick, Chap. 82