[Taxacom] Data quality in aggregated datasets
David Campbell
pleuronaia at gmail.com
Mon Apr 22 12:33:08 CDT 2013
All too often, there is an additional step, namely "interested third party
hunts around trying to find how to contact the data provider". Given that
error checking is not being funded, this means that errors are largely
noticed when an interested user is trying to make use of the data set for
some other project. I'm not a database programmer, but I would think it
would be relatively straightforward to have a "report an error with this
record" button that automatically supplies the provider with the
information they need to identify which record is problematic. As is,
the user often becomes disinterested in bothering if the provider seems
uninterested in correcting. Also, in far too many cases, the data quality
is so spectacularly bad as to discourage the prospective users.
I would note that my experience in sending corrections to WoRMS is quite
positive, although it would be more convenient to have a direct link from
the page rather than looking up the contact email. On the other hand, when
I found that a BOLD record had a completely wrong photo, I was told that
the information in GenBank was not the information they needed to identify
the record.
On Fri, Apr 19, 2013 at 6:04 PM, Robert Mesibov <mesibov at southcom.com.au>wrote:
> There have been occasional grumblings here on Taxacom about data quality
> in the aggregator world, e.g. in GBIF, but what would happen if you
> methodically audited a sample of aggregated species occurrence records?
> What sorts of errors would you find? Would they be rare? Frequent?
>
> I've done an audit of this kind for Australian millipede records in GBIF
> and the Atlas of Living Australia (ALA) and published the results in
> ZooKeys: http://www.pensoft.net/journals/zookeys/article/5111/a-specialist
>
> The audit results can't be generalised to all taxa and all parts of the
> world, but they're pretty disappointing. GBIF and ALA, however, disclaim
> all responsibility for data problems. If there's an error, it's the fault
> of the data provider. So how do errors in online databases get discovered
> and fixed?
>
> In this particular case, an interested third party (me) finds problems and
> alerts the data provider directly. The data provider fixes the errors and
> in the fullness of time sends corrected records to the aggregator.
> (Although I found evidence that erroneous records can persist through an
> update.)
>
> What about aggregated datasets in general? What mechanisms are there for
> detecting and fixing errors besides (interested third party) > (data
> provider) > aggregator?
>
> [Long silence.]
> --
> Dr Robert Mesibov
> Honorary Research Associate
> Queen Victoria Museum and Art Gallery, and
> School of Agricultural Science, University of Tasmania
> Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
> Ph: (03) 64371195; 61 3 64371195
>
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom Archive back to 1992 may be searched with either of these
> methods:
>
> (1) by visiting http://taxacom.markmail.org
>
> (2) a Google search specified as: site:
> mailman.nhm.ku.edu/pipermail/taxacom your search terms here
>
> Celebrating 26 years of Taxacom in 2013.
>
--
Dr. David Campbell
Assistant Professor, Geology
Department of Natural Sciences
Gardner-Webb University
Boiling Springs NC 28017
More information about the Taxacom
mailing list