[Taxacom] saturday morning fun
David Remsen (GBIF)
dremsen at gbif.org
Sun Nov 28 10:34:34 CST 2010
The explanation I can give for this instance, and on which I will try
to provide more detail in a following mail, is this:
1. The Catalogue of Life, with IPNI and Index Fungorum, provide the
ONLY 'taxonomic authority' we have at our means to improve the vastly
inconsistent and messy GBIF primary data index. (the 264M records that
originate in ~8000 natural history collections/observational data.)
2. These 'authorities' only directory overlap taxa in a minority of
the data. To answer questions like "what Coleoptera exist in the
index" requires deriving a taxonomy based on taxonomic information
from the original sources and building a classification based on some
simple rules. We use these sources and rules to build a more
comprehensive classification that is inclusive of all the data. It's
either that or we cannot report on around half the data in the index.
3. Given that homonyms (we called them homographs) exist we need to
account for them which is highly problematic. One rule however, is
that we try to limit homonyms to one per Kingdom.
4. The Catalogue of Life asserts that the genus Mimus is is both a
bird and a weevil (one accepted and one provisionally so). Based on
our rules, the artificial higher taxonomy we assembled appears to have
limited the Animalia to one valid Mimus and placed all Animal Mimus
into the weevils.
http://www.catalogueoflife.org/annual-checklist/2010/browse/tree/id/2327040
http://www.catalogueoflife.org/annual-checklist/2010/browse/tree/id/2302153
This is how it appears to me though I am trying to look into the
specific reason. We recognise the methodology and the sources we use
to derive this merged taxonomic backbone is problematic and have been
working for almost a year now to fix it in our development portal.
We will do better.
The simplest way to help us fix the organisation of these data would
be to
1) provide access to taxonomic authority files to GBIF. We released
a call last week for funds to evaluate our taxonomic checklist format
and use this to publish taxonomic data in an international
standardised format. Any data we can access in this format that is
asserted to be authoritative would be used to improve precision and
recall in our portal.
http://www.gbif.org/communications/news-and-events/showsingle/article/gbif-awards-for-evaluating-checklist-publication-format/
2) Propose a better methodology for organising the data and help
implement it. I can pass anyone interested a data file illustrating
what the 'taxonomy' in our raw data looks like but I don't think many
of you would be prepared for it. However, we would be happy to
consider directing some of our 2011 budget for focusing on these
issues toward ideas. While you might think it's millions in reality
you need to remove a zero and subtract from there.
----------------------------------------------------------------------------
David Remsen, Senior Programme Officer
Electronic Catalog of Names of Known Organisms
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321472 Fax: +45-35321480
Mobile +45 28751472
Skype: dremsen
----------------------------------------------------------------------------
More information about the Taxacom
mailing list