[Taxacom] GBIF data evaluation (was: iSpecies with Wikipedia)

Arturo H. Ariño artarip at unav.es
Thu Mar 27 07:36:31 CDT 2008


Indeed, getting the experts involved will be a major breakthrough towards
data cleaning. But this will only work if said experts can find time and
ease to notify the providers about it. This, in turn, may depend, as you
mention, on an easy and friendly environment... which may, or may not, be
used at all, or work properly at all.

For instance, I only became aware of an issue already discussed here on
drowned European beetles of ours
(http://mailman.nhm.ku.edu/pipermail/taxacom/2007-August/025902.html) after
coming across them through a completely unrelated search. (I was not a
taxacom subscriber by then). I of course investigated the issue(*), and had
the records corrected.

But I had not received any warning through the mechanisms in the GBIF
portal, or any other. It was purely a chance discovery.

Any effort towards making these feedback facilities more useable,
integrated, straightforward, or simply used, might dramatically improve GBIF
data's quality. I can't think of any providers not wishing to set their
records straight, and review by portal users is a prime check.

The alternative, providing only rock-hard data rather than reasonable (i.e.
barring errors) data, would dramatically reduce both the availability of
providers, of good data overall, and the posibility of improving the data
themselves.

Whether a certain amount of noise in the data is tolerable in order to get
more good data available, as exposed in the article about the "marine
legumes", is the subject of a very heathed controversy that should be
settled as soon as possible.

Best regards,

A.-

Dr. Arturo H. Ariño <artarip at unav.es> PGPID:0xFE08ED42
Prof., Theor. Ecology
Dept. Zoology & Ecology, University of Navarra
E-31080 Pamplona, Spain, EU
+34-948425600x6296 fax +34-948425649 www.unav.es/unzyec

(*) The "marine beetles" boiled down to an incorrectly flagged record in a
700,000-record database that cascaded down to an entire bug box through an
improbable chain of separately harmless events, as happens in many
accidents. The records should have been flagged as non-georeferenced, as
they had been typed many years ago (in Spanish) as "Europa" (the continent)
rather than "Europe" (the continent, in English; the geo type was also
incorrectly specified). They weren't and in the course of a coarse georef
check they got the coordinates for Europa Island (in English). 

This is both an example of the risks of poorly supervised machine
georeferencing checks on very large databases and the need for waking up
providers when errors are discovered.


-----------




> Andy Mabbett wrote:
>>> The GBIF map for "pica pica" shows all but one example as being in 
>>> South America (the other in Africa - possibly a wrongly-signed 
>>> longitude?).<<
>
> Roderic Page answered:
>> No idea, iSpecies simply displays what GBIF has. Some locality 
>> records are clearly false (marine organisms on land, and visa versa) 
>> and at some point somebody will hopefully do something about these 
>> (see http://iphylo.blogspot.com/2007/11/gbif-data-evaluation.html ).<
>
> As I'm just trying to spot GBIF occurrence data errors in Insecta 
> Coleoptera Carabidae (>250.000 records already accessible!), I can say 
> yes there are those errors and a lot more when you look into the 
> details (names incorrectly listed under suprageneric taxa, erroneous 
> interpretation of names, unresolved synonyms, incorrect interpretation 
> of localities, etc.).
>
> While lat/lon errors can be cleaned by the machine at least to some 
> extent, the majority of errors will only become visible and can be 
> cleaned when/ if more taxon experts are getting involved. In my 
> opinion, we will see a major breakthrough of the great GBIF idea only 
> when/ if it becomes part of daily routines on the workdesks of 
> experts, and I imagine that projects like iSpecies could become a very 
> helpful forum for both experts and the wider audience. The challenge 
> is, in my
> (IT-non-expert) opinion, how to make projects like iSpecies more 
> interactive for those you could provide expert input?
>
> Just an idea: even as an IT-dummy I can download a GBIF overviewMap, 
> get the data details from GBIF and send back an evaluated/? corrected 
> version, or even add my own data to it. So, maybe, an? evaluated map 
> could be displayed somehow along with the original GBIF map. Things 
> like this could serve not only as an invitation to discuss content 
> among taxon experts, but at the same time alert non-experts about the 
> quality of the displayed content.
> Or, another idea: what if we had a versionized (e.g. annual) static 
> "GBIF atlas" for taxa groups that are regularly evaluated by experts.
> In
> this way, it would be easier to see 'what's-in-it', - what happens on 
> GBIF, which data are evaluated and which are not, etc...
> Right now there are many pitfall traps esp. for non-experts as you can 
> simply display what GBIF, uBio, etc. have...
>
> But anyway, thanks to Roderic Page for doing iSpecies! Seems to be an 
> important step.
>
> Best wishes,
> Wolfgang
> --------------------------------------------------------------------
> Wolfgang Lorenz
> Faunistics & Environmental Planning
> Hoermannstr. 4
> D-82327 Tutzing, Germany
>





More information about the Taxacom mailing list