[Taxacom] Data quality in aggregated datasets
Alastair Culham
a.culham at reading.ac.uk
Sat Apr 20 02:10:42 CDT 2013
A group of us reviewed GBIF data quality many years back - http://www.plosone.org/article/info:doi/10.1371/journal.pone.0001124
Some of those issues have been addressed and some remain.
Aggregated data have problems resulting from aggregation but also from variable quality source data that are common in large datasets.
____________________________________________
Dr Alastair Culham
Centre for Plant Diversity and Systematics
Harborne Building, School of Biological Sciences
University of Reading, Whiteknights, Reading, RG6 6AS
U.K.
Curator, Reading University Herbarium (RNG)
Associate Editor, Botanical Journal of the Linnean Society
Programme Director, MSc Plant Diversity
i4Life Coordinator
____________________________________________
________________________________________
From: taxacom-bounces at mailman.nhm.ku.edu [taxacom-bounces at mailman.nhm.ku.edu] on behalf of Robert Mesibov [mesibov at southcom.com.au]
Sent: 19 April 2013 23:04
To: TAXACOM
Subject: [Taxacom] Data quality in aggregated datasets
There have been occasional grumblings here on Taxacom about data quality in the aggregator world, e.g. in GBIF, but what would happen if you methodically audited a sample of aggregated species occurrence records? What sorts of errors would you find? Would they be rare? Frequent?
I've done an audit of this kind for Australian millipede records in GBIF and the Atlas of Living Australia (ALA) and published the results in ZooKeys: http://www.pensoft.net/journals/zookeys/article/5111/a-specialist
The audit results can't be generalised to all taxa and all parts of the world, but they're pretty disappointing. GBIF and ALA, however, disclaim all responsibility for data problems. If there's an error, it's the fault of the data provider. So how do errors in online databases get discovered and fixed?
In this particular case, an interested third party (me) finds problems and alerts the data provider directly. The data provider fixes the errors and in the fullness of time sends corrected records to the aggregator. (Although I found evidence that erroneous records can persist through an update.)
What about aggregated datasets in general? What mechanisms are there for detecting and fixing errors besides (interested third party) > (data provider) > aggregator?
[Long silence.]
--
Dr Robert Mesibov
Honorary Research Associate
Queen Victoria Museum and Art Gallery, and
School of Agricultural Science, University of Tasmania
Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
Ph: (03) 64371195; 61 3 64371195
_______________________________________________
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
The Taxacom Archive back to 1992 may be searched with either of these methods:
(1) by visiting http://taxacom.markmail.org
(2) a Google search specified as: site:mailman.nhm.ku.edu/pipermail/taxacom your search terms here
Celebrating 26 years of Taxacom in 2013.
More information about the Taxacom
mailing list