[Taxacom] Data quality in aggregated datasets

Dean Pentcheff pentcheff at gmail.com
Thu Apr 25 14:23:03 CDT 2013


So, on this conversation, I think we've just reached #4.

Give me a ring when we're back to #1.

:)

-Dean
-- 
Dean Pentcheff
pentcheff at gmail.com
dpentche at nhm.org


On Thu, Apr 25, 2013 at 1:41 AM, Roderic Page <r.page at bio.gla.ac.uk> wrote:

> Leaving aside the issues of what both providers and aggregators can do to
> clean the data, we seem trapped in an endless cycle of :
>
> 1. OMG the data is broken!
>
> 2. SOMETHING MUST BE DONE!
>
> 3. Wave arms frantically, mention projects currently underway that will
> almost certainly solve the problem "real soon now".
>
> 4. ... [tumble weed]
>
> 5. Go to 1
>
> There at least things we need to do to tackle this problem, and until we
> do we're not being serious about data quality.
>
> 1. Identifiers
>
> In order to clean data that data has to persist long enough for people or
> algorithms to act on it. If I add an annotation to a piece of data I want
> that information to persist, otherwise why would I bother? At the level of
> specimens we don't have identifiers, and few have shown any commitment to
> tackling this problem (notable exception is Roger Hyam's work at the RBGE,
> see http://www.mapress.com/phytotaxa/content/2012/f/pt00073p030.pdf  ).
> GBIF routinely deletes vast (in some cases literally millions) of specimen
> URLs, so any attempt to attach annotations to those records is doomed.
>
> 2. Annotation tools
>
> Of course there are tools being developed by our community, but I've not
> seen any that look at all usable. In the real world we are used to tracking
> packages being couriered around the world (there's an app for that), and
> many will have come across feedback tools online where you can notify a
> site of an issue and engage in a conversation to resolve it. There are also
> more general annotation tools being developed, e.g. http://hypothes.is/ Let's leverage these.
>
> Annotation rests on being able to identify the thing being annotated, and
> on the web URLs serve that purpose. Until we have stable URLs for
> specimens, and these are used by everyone who has something to say about
> that specimen, then we are doomed to repeat steps 1-5.
>
> But of course, we know all this, and have done so for a while...
>
> Regards
>
> Rod
>
>
> ---------------------------------------------------------
> Roderic Page
> Professor of Taxonomy
> Institute of Biodiversity, Animal Health and Comparative Medicine
> College of Medical, Veterinary and Life Sciences
> Graham Kerr Building
> University of Glasgow
> Glasgow G12 8QQ, UK
>
> Email: r.page at bio.gla.ac.uk
> Tel: +44 141 330 4778
> Fax: +44 141 330 2792
> Skype: rdmpage
> Facebook: http://www.facebook.com/rdmpage
> Twitter: http://twitter.com/rdmpage
> Blog: http://iphylo.blogspot.com
> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> Wikipedia: http://en.wikipedia.org/wiki/Roderic_D._M._Page
> Citations: http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
> ORCID id: http://orcid.org/0000-0002-7101-9767
>
> _______________________________________________
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom Archive back to 1992 may be searched with either of these
> methods:
>
> (1) by visiting http://taxacom.markmail.org
>
> (2) a Google search specified as:  site:
> mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>
> Celebrating 26 years of Taxacom in 2013.
>



More information about the Taxacom mailing list