data sharing

Fri Dec 4 13:55:41 CST 1998

At 11:07 AM 12/4/98 -0800, Peter Rauch wrote:
>If used data are to be well-understood (by the user), then I'd argue
>that the audit trail for those data should be made available to the user
>of the data.

I think Peter and others arguing for a strong audit trail and public
access to such a feature might not be fully appreciative of the
complexities involved at both the data managment and public
access points in the system.

Hugh Wilson advocates getting data into systems on online in
as straightfoward and rapid a system as possible.  Taken to an
extreme this certainly could result in some pretty poor correlations
between the original catalog data and its current form in an online
database.  But to implement an audit trail as defined by U Smith

"audit trail" is "record history";  an audit trail allows future
curators to painlessly, effortlessly reconstruct any and all changes
made to a record, if necessary, and recover any prior instance of the
record"

is a monumental programming effort.  It is unclear how Peter would
handle the delay in acquisition and display of data that would result if we
waited on a system that behaves as above.  I am pretty sure there
are none available today for natural history museums.  What Hugh
seems to be saying is that if we wait on these features until we start
the process of computerization and online access, we will have
missed the boat.  So while it might be a great goal,  we need to
getting going and do something now before collections are relegated
to an even lower status that they have today.

To give a few examples of the difficulty of audit trail system design
and management, let me give a few examples of problems we faced
in attempting to design a data management system with full audit trails.

You have to decide what is a "change" and where to log
it.  For example if a data entry person mistypes a record and goes
back to change it right away, is that change?  If not, how and where do
you distinguish. (I can tell you focus groups responded that they wanted
control over what was audited but didn't want it to take any extra effort,
hows
that for unrealistic).  Are changes recorded at the record or field level?

Do you distinguish between changes that are mass updates
(e..g changing Brasil to Brazil) from individual record changes?

Rollback systems in commercial databases produce
huge (really huge) files if you don't commit the changes, so
the goal of "easy" undo's for the life of a database is not
realistic.

Do searches look at the audit trail records?

How do you distinguish between specimens literally reidentified
from simple nomenclatural updates (an very important issue
for the "quality of data" issue).

How do you display the record history on a 10 table joined
view?  This might be a typical edit form for a catalog entry,
and is already overloaded with information.  Where do you
add previous version of dozens of fields?

This is just a sampling of the issues involved and I dare say
that few if any have been solved in any system in use today
and none in online access systems (interface design issues
get really tricky in a web interface).

The critical issue seems to be a disagreement on the costs/benefits
of more complex, more featured data acquistion and management
software.   Its very easy (we all might agree even),  to say that we want
these 100 features in our curatorial system.  But if we put a time
and dollar cost on each (both development and staff time) and then
add a data quality/data retrieval/data value benefit to each feature
we would only come to a single solution if we all agreed on the numbers.
Does the audit trail feature increase development and curatorial
staff costs by 5%, 15% or 25%.  I am not sure but the answer to that
would certainly help focus the debate.

Julian Humphries