[Taxacom] What can Global Biodiversity Information Facility (GBIF) do for you?

Sun Oct 20 05:44:13 CDT 2013

Hi Bob,

Having spent a lot of time trying to extract content from BHL for projects such as BioStor and BioNames, the kinds of issues you raise for specimens sound all too familiar.

BHL grabs physical things, scans them, associates whatever metadata library catalogues have, and puts the online. Simples. Ah, but then the fun starts. Locating articles (i.e., the things we actually cite) in BHL is sometimes straightforward, but often it is anything but. Journals can change names, may have multiple names (sometimes in multiple languages), concurrent or inconsistent volume and/or page numbering, etc. Notions that we take for granted today (that there are "articles" and that they have explicit titles) may not hold, and off course every taxonomist knows that determining the data of publication can be a challenge (as I'm sure Neal Evenhuis, among others, will testify). Much of the time I spend on BioNames consists of taking cryptic, often misleading (if not downright erroneous) citations to original descriptions and matching these to BHL (or other sources).

My point is that I don't think there's a world of difference between the two problems. For all the issues that you document in "A specialist’s audit of aggregated occurrence records" http://dx.doi.org/10.3897/zookeys.293.5111 , I could probably find equivalent horror stories for bibliographic data.

As you say, many of the basic elements of a GBIF occurrence are potentially contested, subject to uncertainty, error, etc. I guess it's for everyone to decide whether the trade-off involved in simplifying the data so it can be aggregated in bulk is worthwhile.

One thing I'd like to see is GBIF occurrence data integrated with the literature, for example by linking specimens to their citation in the literature (another reason to play with BHL). If we can go from a specimen to the associated literature we could then track some of the issues you mention, such as different identifications, discussion of whether the collection locality is correct, etc.

For a simple example, the specimen FMNH 147942 appears in at least three articles in BioStor (http://biostor.org/specimen/FMNH%20147942 ). Below are the three article links plus text extract around the specimen code:

http://biostor.org/reference/81423

Crunomys suncoides Rickart et al., 
1998. — Mindanao Island, Bukidnon Prov- 
ince, Mount Katanglad Range, 18.5 km S, 
4 km E Camp Phillips, elev. 2,250 m, 
8°9'30"N, 124°5rE, 1 male (FMNH 
147942).

http://biostor.org/reference/65896

Crunomys suncoides Rickart, Heaney, Tabar- 
anza, and Balete, 1998 
The Kitanglad shrew-mouse is currently 
known only from the Kitanglad Range (Rickart 
et al., 1998), though we suspect that it is more 
widespread in mossy forest on Mindanao. The 
species was described based on a single adult 
male (FMNH 147942; 37 g) we captured in April 
1993 in old-growth mossy forest at 2250 m (Site 
6, Fig. 8). It had scrotal testes measuring 14 X 
8 mm. 

http://biostor.org/reference/95679

Crunomys suncoides, new species 
(Figs. 2, 4-9) 
HoLOTYPE — Adult male, fmnh 147942; collect- 
ed 10 April 1993 (original number 5330 of L. R. 
Heaney); initially fixed in formalin, now pre- 
served in ethyl alcohol with the skull removed 
and cleaned. The stomach and both femora have 
been removed; otherwise the specimen is in ex- 
cellent condition. It is deposited at fmnh but will 
be transferred to pnm. 

Each tells us something about the specimen (and more than GBIF does). So, what if we linked this information together so that GBIF users could learn more about that record?

Regards

Rod

On 19 Oct 2013, at 22:53, Bob Mesibov wrote:

> Hi, Rod.
> 
> 'So, if the argument is that GBIF should be looking beyond museum collections then I completely agree...'
> 
> No, that's not the argument. Biodiversity data aren't like biodiversity books or papers, for which you can (in principle) generate a complete catalogue or index. Given such a catalogue or index, you can go further and digitise and make available on the Web all the content. Cool, yes? Anyone anywhere with access to the Web can view a biodiversity publication at the click of a mouse. This works because biodiversity publications are very well-defined objects which either exist or don't. BHL is hugely valuable and 'intrinsically' successful because the goal of digitising all biodiversity publications is achievable, in principle.
> 
> GBIF is intrinsically unsuccessful because it treats occurrence records as very well-defined objects, which they aren't. Each record is instead an entry point into an investigation (minimally) of the identity of the organism(s) observed, of the location of the observation, of the timing of the observation, of the observer and of the fate of any specimen(s) or images which are hard evidence for the observation. I say 'minimally' because the museum records that wind up in GBIF often have more than these basics in their 'pre-GBIF' form, and are sometimes only condensed versions of even more information available elsewhere. You don't get that from GBIF.
> 
> Records aren't open-ended, but some users will go much further with them than other users. GBIF best suits users who accept the data as-is and can find trivial purposes for which those untested, sparse data are 'fit'.
> 
> The argument that GBIF in fact suits everyone - because it lets everyone know where to find out more - fails because GBIF is a lousy index. It contains lots of errors, it's taxonomically, geographically, ecologically and 'literature-wise' grossly incomplete, and for many biodiversity studies (see Meier and Dikow) you're better off starting with your own plan of attack and chasing sources independently.
> 
> It would be possible to rebuild GBIF from scratch as the thing its title suggests (an information facility), namely a 'meta' resource that points to and introduces data sources, but I don't think that's going to happen, because it's too hard. GBIF has taken an easier approach and has been accumulating records as though they were coins, and measuring its usefulness by counting its 'wealth' of records, so that if it has twice as many records it must be twice as useful, right? Other people in this thread have pointed out how raw counts are meaningless for assessing usefulness. Here I just wanted to say that what works for BHL doesn't work for GBIF, because the items being made Web-available are inherently different.
> -- 
> Dr Robert Mesibov
> Honorary Research Associate
> Queen Victoria Museum and Art Gallery, and
> School of Agricultural Science, University of Tasmania
> Home contact:
> PO Box 101, Penguin, Tasmania, Australia 7316
> (03) 64371195; 61 3 64371195
> 

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: 		r.page at bio.gla.ac.uk
Tel: 			+44 141 330 4778
Fax: 		+44 141 330 2792
Skype: 		rdmpage
Facebook: 	http://www.facebook.com/rdmpage
LinkedIn: 	http://uk.linkedin.com/in/rdmpage
Twitter: 		http://twitter.com/rdmpage
Blog: 		http://iphylo.blogspot.com
Home page: 	http://taxonomy.zoology.gla.ac.uk/rod/rod.html
Wikipedia: 	http://en.wikipedia.org/wiki/Roderic_D._M._Page
Citations: 	http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
ORCID: 		http://orcid.org/0000-0002-7101-9767