FW: barcodes

Sat Jul 27 09:25:18 CDT 2002

>we should be trying to
>minimize duplicated entry of data by providing consistent, distributable
>electronic data for newly collected specimens and automating data
>acquisition for legacy specimens in general.

amen, brother...

>I have reviewed a number of proposals lately that have proposed to develop
>databases for museum specimen management without appropriate controls on
>where the data came from, or without an inherent understanding of what the
>data mean.

Yes, understanding of the data and the applications of any standards used
in its representation is incredibly important, especially when it comes to
data exchange and distributed access.

I am not convinced that there is any great merit in tracking the lineage of
duplicate specimen data independently of the duplicate specimen to which
data must be attached.  Does it really matter where the data elements came
from as long as they are correct and verifiable in relation to that specimen?

>As a purely
>biological data kind of project, it doesn't make sense to copy the data
>either. You should just have an accession number or other key and have a
>pointer in your database to the other specimen in another database. Then you
>aren't using your resources to support data that another place has already
>supported.

In theory yes, but in practice each herbarium is unique and although they
may have a genetically identical duplicate specimen with exactly the same
components and identical specimen labels, they do different things with it,
have different needs of it, and document it in different ways.  (We can
debate whether or not they should behave in this manner, but the fact
remains that they do.)

Also, each needs instant access to the data and the ability to edit it
according to their local needs.  They are each custodians of different
duplicate specimens and as such have an obligation to maintain the metadata
associated with these specimens

>If herbaria are entering all of their data by hand (without a
>graphical way for the end user to assess the validity of the data) the then
>they are stuck in the past and wasting a huge amount of time and they
>shouldn't be funded anyway.

heavy... the grant reviewer from hell...  :)

>but it is frustrating to me to see copying of data often thrown up
>as the thing that will get all herbaria databased cheaply when it really
>isn't exactly where we should be going. I think that we should be
>concentrating on semantic web technologies that allow us to integrate
>existing data better, and on developing much better graphical automated data
>entry.

Indeed...  and the technology is tantalizingly within reach of many
herbaria...  but I think the major issues (other than the big resource one)
are managerial, organizational, sociological and procedural rather than
technological.

Based on the number of specimens that have to be databased, the Australia's
Virtual Herbarium 6 million specimen data capture project is underfunded to
the tune for 30% or more.   We are going ahead with it because a) something
is better than nothing, and b) sharing data from distributed duplicate
specimens should provide significant data capture efficiency and economy.

The distributed nature of the AVH project means that each herbarium is
entirely responsible for the specimens and data for which it has
custodianship.   Thus, although we have duplicate specimen of AD12345678,
it is actually CANB98765432, and it would be misleading and in part
inaccurate to simply link to the existing AD record when what we are
managing is not the gathering but the specimen.  In theory large chunks
should be the same (verbatim quotes of locality, habitat, habit data, etc.)
and I can imagine a distributed system that picks up these bits from AD,
but to build such a fractionated application would be more trouble that it
is worth and would put us at the mercy of AD for a significant part of
daily core business.

The solution the Australian herbaria are adopting it tempered by large
doses of pragmatism.  Assuming (desperately hoping?) that the duplication
of specimens between herbaria is high, we will be (in fact are) sharing the
data that we have each captured for these specimens.  Rather than becoming
the record for the duplicate specimen, the shared data is uses as the
*basis* for the record, being edited (extended, contracted, etc.) according
to the local requirements of the custodial herbarium so that it represents
exactly how the *specimen* is being managed.  Every record is visited,
checked and edited by data entry staff but there is considerable saving
because much (most) of the quoted label data does not have to be rekeyed.

Because the data becomes ours in our database associated with our specimen,
maintained by us, it does not matter that we got the basis from NSW who got
it from MEL who received it as part of an original gift of specimen and
data from PERTH.  The important fact is we got it and we did not have to
type it all in.

The ultimate test of accuracy is the specimen itself in the collection so
we do not bother to record the diverse means the data may have got in there
- we probably could, but what useful question could ever be asked of
it?  We are primarily concerned that the data and its interpretation
relative to the specimen is right and appropriate, and if it isn't we
correct it.  Once it is in there, whether we exchanged it, scanned it or
typed it doesn't seem to be all that important...

jim

~ Jim Croft ~ jrc at anbg.gov.au ~ 02-62465500 ~ www.anbg.gov.au/jrc/ ~