[Taxacom] Data quality in aggregated datasets

Robert Mesibov mesibov at southcom.com.au
Sun Apr 21 18:33:08 CDT 2013


Dean Pentcheff wrote:

"http://wiki.filteredpush.org 
There is much more than silence, but making a working system takes both an initial effort and changes in the way provider systems work. It will take time to take effect."

My question was "What mechanisms are there for detecting and fixing errors besides (interested third party) > (data provider) > aggregator?". Isn't FilteredPush a project to streamline just that mechanism, with cooperating data providers?

The 'silence' I referred to is coming from the aggregators, who seem committed to ignoring this kind of error-fixing mechanism: aggregator > (data provider) > aggregator.

GBIF likes to emphasise that it is only a facilitator. It doesn't own the data it publishes, it merely provides a place for data holders to 'expose' what they have. It is resolutely ignoring the opportunity provided in this for an outside party (GBIF or a GBIF-contracted service) to do some basic record-checking, then collaborate with the data holder to make corrections or add 'Queried' flags. There would be benefits in this for all interested parties: the data holders, GBIF as publisher, and end-users. This isn't happening.

GBIF has been going for how many years? And has finally gotten around to talking about offering advice to participants about data quality: http://community.gbif.org/pg/groups/21292/biodiversity-data-quality-interest-group/

As I suggested in an earlier post and in my ZooKeys paper, the barriers to data-checking at aggregator level aren't technical. Call them 'policy' or 'attitudinal' barriers, they're not unlike Person A being reluctant  to tell Person B that they've made a mistake, because A wants to remain friends with B and doesn't want to upset B, and anyway, what's a little mistake?

The analogy fails because the aggregators (A) are multi-million dollar organisations hoping to service a global community, and dealing with multi-million dollar organisations (B) whose 'mission statements' probably talk about a commitment to 'continuous improvement'.

Note: I say all this (and I published the ZooKeys paper) without much hope of seeing reform. As explained in the paper, I've created an alternative for my little basket of the world's species occurrence records, and unlike the aggregators, I write directly to data providers with messages like 'Could you please check [records]? It looks like there are errors in the lat/lon's, which probably should be [X]. Many thanks.'
-- 
Dr Robert Mesibov
Honorary Research Associate
Queen Victoria Museum and Art Gallery, and
School of Agricultural Science, University of Tasmania
Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
Ph: (03) 64371195; 61 3 64371195




More information about the Taxacom mailing list