data sharing

Peter Rauch anamaria at GRINNELL.BERKELEY.EDU
Thu Dec 3 16:44:37 CST 1998


On Thu, 3 Dec 1998, Hugh Wilson wrote:
> Its remarkable to me that this issue becomes critical only when the
> dicussion turns to collections data in digital form.

Hugh, I don't think it is a matter of "only" becoming critical when data
become available in digital form. The issue is whether or not the data
lead to uses that are critical (useful) and whether or not those data
are reliable. That issue is independent of _how_ the data were obtained,
and should be important (and indeed, it _has_ been important) for both
"traditional access and use" and newer modes of access and uses.

What's different about the use of data in digital form, if a difference
must be identified, includes both the ease of access (much more data can
be accessed and "used" at a much greater rate), and the greater variety
of uses to which it can be put (both because it's possible to get at
data in ways that were too expensive before, and because there are new
applications (new needs) which the data can serve (i.e., collections
become more useful). Whether these differences make it more or less
important than before to assure that the data are as correct as
possible, and to carry an audit trail (something that is also easier and
more affordable in electronic format), are arguable. But, I don't think
it is reasonable to argue that quality of data is unimportant to "public
users", which is what I understood your argument to be earlier.

>  Curators seem
> to have no problem with the condition of specimen-based data that is
> open for public inspection but tucked away in cases.

Sure they do. Good curators are very concerned about the quality and
integrity of the data they curate. Traditionally, as I suggested, they
perhaps did not have to be concerned about certain kinds of uses simply
because those uses were non-existant when the data were difficult or
impossible to access.

> Things get
> dicy, however, when this moves from the cases into the 'commons'.

I'd say --when this moves to much more easily accessed data (as distinct
from "electronic" or "open" access). With greater ease of access, and
with a broader spectrum of uses possible, I think it is reasonable for
the curator to become concerned in greater degree with the quality of
the data she is offering to the public.

> The telephone company publishes a phone book every year and this
> carries errors and omissions at the time of printing which increase
> as the year progresses.

Collections data, and their uses, are not telephone books and their
uses. So, how telephone companies manage telephone books is probably not
an interesting comparison.

> ...  Annual revisions of this hardcopy [telephone book]
> document - with no audit trail - result in an improved and fully
> functional product.

If decisions about environmental management and environmental policy are
being formulated based on phone book data, then I'd be concerned. Like I
suggest, collections data are put to uses which makes it interesting if
not essential to know whether the data are reliable and _were_ correct.

> ... Most of the data associated with
> systematic collections is accurate and useful and my point relates to
> priorities.

Indeed. And, I didn't say "Don't digitize." I said, "Don't reject audit
trails out of hand, just because the data are being used by "the
public" (which seemed to be your proposition earlier).

>  What is the 1st order of business?  Is it best to get
> the information, warts and all, computerized and on-line as soon as
> possible?

Yes. And, concurrent with that, as corrections are made to the data,
record the transactions.

>  Or, should we invest in endless 'workshops',  'symposia',
> and general discussion regarding the development of complex data
> management and expression  systems and 'standards' *before* the
> computerization effort begins.

That is not the alternative to your first one. The alternative is "Do we
get the data online, AND document its quality, or do we just put data
online and silently allow it to mutate as necessary?"

>  The consensus seems to be with the
> latter approach and, as a result, there is - at this point in time -
> not much (relative to the potential) data to share.

The reason more data are not available to share, from an entomologist's
perspective perhaps, is that there is a lot of retrospective data
capture to do, and it ain't cheap to do it. If we're going to look at
"potential", then we must look at the data in large collections as a
critical part of that potential --entomological collections qualify in
this regard. Many other types of collections managers believe that they
do not have the funding to support intense ("do it all now")
digitization of their collections either, I presume (that's what I hear
them say anyway).

Regarding relative priorities, I think it is fair to ask the question
--how much digital data, what percentage of it, might be in error? The
answer is hopefully and probably a very small percentage. So, maybe it's
simply unimportant, you would probably argue, to invest in building
audit trails. I wouldn't make that argument, because we really don't
know what the percentage is, nor do we know how such errors --few or
many-- will affect the uses, esp. new uses.

I suppose I'd like to ask you why you believe that creating audit trails
is a priori unnecessary in information systems to be accessed by "public
users" of (electronically-accessible) collections data? I reject the
arguments that phone companies don't do it (seems irrelevant), and that
the traditional systematic uses didn't need audit trails (traditional
taxonomists were and probably still are almost anal about their data
handling, and they were generally accessing the primary, not just
secondary, data), and besides, it's the new uses about which we might
want to understand the impacts of data errors before rejecting the
utility of audit trails.
  Peter




More information about the Taxacom mailing list