[Taxacom] saturday morning fun

David Remsen (GBIF) dremsen at gbif.org
Sun Nov 28 10:34:34 CST 2010


The explanation I can give for this instance, and on which I will try  
to provide more detail in a following mail, is this:

1.   The Catalogue of Life, with IPNI and Index Fungorum,  provide the  
ONLY 'taxonomic authority' we have at our means to improve the vastly  
inconsistent and messy GBIF primary data index. (the 264M records that  
originate in ~8000 natural history collections/observational data.)

2.  These 'authorities'  only directory overlap taxa in a minority of  
the data.   To answer questions like "what Coleoptera exist in the  
index" requires deriving a taxonomy based on taxonomic information  
from the original sources and building a classification based on some  
simple rules.  We use these sources and rules to build a more  
comprehensive classification that is inclusive of all the data.   It's  
either that or we cannot report on around half the data in the index.

3.  Given that homonyms (we called them homographs) exist we need to  
account for them which is highly problematic.  One rule however, is  
that we try to limit homonyms to one per Kingdom.

4. The Catalogue of Life asserts that the genus Mimus is is both a  
bird and a weevil  (one accepted and one provisionally so).   Based on  
our rules, the artificial higher taxonomy we assembled appears to have  
limited the Animalia to one valid Mimus and placed all Animal Mimus  
into the weevils.

http://www.catalogueoflife.org/annual-checklist/2010/browse/tree/id/2327040
http://www.catalogueoflife.org/annual-checklist/2010/browse/tree/id/2302153

This is how it appears to me though I am trying to look into the  
specific reason.   We recognise the methodology and the sources we use  
to derive this merged taxonomic backbone is problematic and have been  
working for almost a year now to fix it in our development portal.    
We will do better.

The simplest way to help us fix the organisation of these data would  
be to

1) provide access to taxonomic authority files to GBIF.   We released  
a call last week for funds to evaluate our taxonomic checklist format  
and use this to publish taxonomic data in an international  
standardised format.  Any data we can access in this format that is  
asserted to be authoritative would be used to improve precision and  
recall in our portal.

http://www.gbif.org/communications/news-and-events/showsingle/article/gbif-awards-for-evaluating-checklist-publication-format/

2) Propose a better methodology for organising the data and help  
implement it.   I can pass anyone interested a data file illustrating  
what the 'taxonomy' in our raw data looks like but I don't think many  
of you would be prepared for it.   However, we would be happy to  
consider directing some of our 2011 budget for focusing on these  
issues toward ideas.   While you might think it's millions in reality  
you need to remove a zero and subtract from there.
----------------------------------------------------------------------------
David Remsen, Senior Programme Officer
Electronic Catalog of Names of Known Organisms
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321472   Fax: +45-35321480
Mobile +45 28751472
Skype: dremsen
----------------------------------------------------------------------------








More information about the Taxacom mailing list