ITIS (an explanation of GBIF's data integration activities)

Wed Jun 23 22:13:05 CDT 2004

I wonder whether the exchange standard for taxonomic name
and concept data ( http://tdwg.napier.ac.uk/phpwiki/index.php) is the
best place to start. The scheme itself is huge (does anybody enjoy
reading these things?), and probably only of interest to those few
who have large databases of names (as opposed to those who would be
interested in contributing names to a database).

For me quickest way towards the vision that Donald Hobern outlines is
to get a simple standard for returning results from querying
taxonomic databases. As an exercise I constructed a "portal" where a
user to types in a name and ITIS, uBio, IPNI, and GenBank are all
queried for that name (see
http://darwin.zoology.gla.ac.uk/~rpage/MyToL/www/portal.html --
"Phyllocladus" is a good name to try). This is really a toy, but I
was interested in seeing how easy it was to query existing databases
to check the status of a name -- perhaps the most basic use of a
taxonomic name database. Certainly, for my own phylogenetic projects
this is what I need.

Of course, each database has its own interface and means of returning
data, and so the portal has to cope with everything from XML to plain
text, and each database returns rather different kinds of information
(ITIS being the most comprehensive).

We need a simple, easily implemented standard way of asking questions
of taxonomic databases. For example, it is pretty easy to get a
Microsoft Excel spreadsheet to use SOAP to talk to an Internet
database (see
http://darwin.zoology.gla.ac.uk/~rpage/MyToL/www/excel.php ) --
wouldn't it be great if users had a list of names in a spreadsheet
and could call an Excel function to check the status of those names?
This is easy to do, and would seem to me to give people a much
stronger incentive to get involved, rather than wade through database
exchange schema.

Lastly, Mary Barkworth alludes to the problem of data availability.
ITIS and GenBank make all their data available by FTP, which is one
reason why these databases are widely used in efforts to build name
databases (ITIS is used by Species 2000, GBIF, SEEK, etc.). Other
databases, such as IPNI which permit only a name (or a few) at a time
are frustrating to use, and hence don't feature much in large scale
efforts. Databases such as IPNI would be vastly more useful if their
content was available for bulk downloading.

There is a useful discussion of database interoperability in the
"Report from the NIH/NIAID/Wellcome Trust Workshop on Model Organism
Databases" (http://www.genome.gov/10006356 ). FTP and "screen
scrapping", which pretty much where we are at now, is listed as the
worst level of interoperability (and we're lucky if we have FTP
access. (I should mention that uBio and GenBank provide SOAP access).

To make the vision happen we need compelling, practical examples of
useful tools.

Regards

Rod

--
--------------------------------------------------------
Professor Roderic D. M. Page
Editor Elect, Systematic Biology
DEEB, IBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QP
United Kingdom

Phone:    +44 141 330 4778
Fax:      +44 141 330 2792
email:    r.page at bio.gla.ac.uk
web:      http://taxonomy.zoology.gla.ac.uk/rod/rod.html
reprints: http://taxonomy.zoology.gla.ac.uk/rod/pubs.html

Subscribe to Systematic Biology through the Society of Systematic
Biologists Website:  http://systematicbiology.org