[Taxacom] Data quality in aggregated datasets

Quentin Groom quentin.groom at br.fgov.be
Fri Apr 26 03:32:18 CDT 2013


As someone that would like to use GBIF data for modeling and data 
analysis, and as a collector of distributional data, I'm not that 
flustered about odd misidentifications, wrong grid references and other 
random errors. By far the largest problem is all the missing data, both 
from counties that don't participate in GBIF and from participating 
countries that shared their data once and don't update it.
Modelers expect the data to be ugly, it can even be factored in to their 
models. However, they can't do that if they don't have the data in the 
first place.
While we should not be complacent about quality I would much rather we 
focus our efforts on data availability.
Quentin

Roderic Page wrote:
> Leaving aside the issues of what both providers and aggregators can do to clean the data, we seem trapped in an endless cycle of :
>
> 1. OMG the data is broken!
>
> 2. SOMETHING MUST BE DONE!
>
> 3. Wave arms frantically, mention projects currently underway that will almost certainly solve the problem "real soon now".
>
> 4. ... [tumble weed]
>
> 5. Go to 1
>
> There at least things we need to do to tackle this problem, and until we do we're not being serious about data quality.
>
> 1. Identifiers
>
> In order to clean data that data has to persist long enough for people or algorithms to act on it. If I add an annotation to a piece of data I want that information to persist, otherwise why would I bother? At the level of specimens we don't have identifiers, and few have shown any commitment to tackling this problem (notable exception is Roger Hyam's work at the RBGE, see http://www.mapress.com/phytotaxa/content/2012/f/pt00073p030.pdf  ). GBIF routinely deletes vast (in some cases literally millions) of specimen URLs, so any attempt to attach annotations to those records is doomed. 
>
> 2. Annotation tools
>
> Of course there are tools being developed by our community, but I've not seen any that look at all usable. In the real world we are used to tracking packages being couriered around the world (there's an app for that), and many will have come across feedback tools online where you can notify a site of an issue and engage in a conversation to resolve it. There are also more general annotation tools being developed, e.g. http://hypothes.is/  Let's leverage these.
>
> Annotation rests on being able to identify the thing being annotated, and on the web URLs serve that purpose. Until we have stable URLs for specimens, and these are used by everyone who has something to say about that specimen, then we are doomed to repeat steps 1-5. 
>
> But of course, we know all this, and have done so for a while...
>
> Regards
>
> Rod
>
>
> ---------------------------------------------------------
> Roderic Page
> Professor of Taxonomy
> Institute of Biodiversity, Animal Health and Comparative Medicine
> College of Medical, Veterinary and Life Sciences
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QQ, UK
>
> Email: r.page at bio.gla.ac.uk
> Tel: +44 141 330 4778
> Fax: +44 141 330 2792
> Skype: rdmpage
> Facebook: http://www.facebook.com/rdmpage
> Twitter: http://twitter.com/rdmpage
> Blog: http://iphylo.blogspot.com
> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> Wikipedia: http://en.wikipedia.org/wiki/Roderic_D._M._Page
> Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
> ORCID id: http://orcid.org/0000-0002-7101-9767
>
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom Archive back to 1992 may be searched with either of these methods:
>
> (1) by visiting http://taxacom.markmail.org
>
> (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>
> Celebrating 26 years of Taxacom in 2013.
>
>
>   

-- 
Dr. Quentin Groom
(Botany and Information Technology)

National Botanic Garden of Belgium
Domein van Bouchout
B-1860 Meise
Belgium

Landline; +32 (0) 226 009 20 ext. 364
FAX:      +32 (0) 226 009 45

E-mail:     quentin.groom at br.fgov.be
Skype name: qgroom
Website:    www.botanicgarden.be





More information about the Taxacom mailing list