More GBIF questions (was: ITIS)

Thu Jun 24 10:00:43 CDT 2004

> >One way to reduce (certainly not eliminate) such over-description is 
> >to, as Meredith has already described, provide a service to 
> access all 
> >existing names. It is this goal about which I preach.
> 
> Access to names *and opinions*, that is. The latter is a very 
> different issue, but an absolute necessity. 

By "opinions", I assume you mean alternate concepts for names?  That's
where the work that the SEEK folks are doing kicks in. And, yes, I agree
-- as per my earlier names/concepts diatribe, I think both are very
important things to track, and should be thought of as distinct entities
(although my own aproach to structuring the data anchors all names to
the "brith" concept implied by the original description -- but that's a
technical issue).

> Along these lines, Roger wrote:
> 
> >I believe what we need, in botany, is the equivelent of a 
> central 'meta'
> >directory of names. This directory would store all published names ( 
> >could be based on IPNI as a starting point ) and opinions 
> about those 
> >names. No data would be discarded. Interfaces (through SOAP 
> or simple 
> >http calls) would be provided for other databases to reflect their 
> >opinions on the names, give an indication of what other data 
> they carry 
> >on these names and provide a link to the data. Anyone at any 
> time would 
> >be able to log in and provide a comment on a name or submit 
> a new one. 
> >There would be no moderation only abuse control. The whole dataset 
> >would be served over the net and would be available as a download or 
> >snail-mailed CD-ROM on a monthly snapshot basis. There will be news 
> >feeds and watch lists so you can be notified of new entries 
> in the categories you are working on.

I'll take this opportunity to say that what Roger describes is
essentially EXACTLY how I would envision it working; but like Doug --
I'd like to see it applied to all names.

Back to Doug:

> We need this for all taxa, not just plants, and (as above) on
> *proposed* taxon names, as well as those already published. 

One of the things that Doug and I have butted heads on in the past is
the "all at once" approach, vs. the "baby steps" approach (e.g., while I
would like to see the Code changes in the way Doug envisions, I'd rather
work up to that in stages).  Doug: do you think that a system that
allows commentary/feedback/etc. on existing names could function without
simultaneously instigating the rules for new names?  Or must the two
necessarily be developed/implemented together?

> But consider the practical consequences of trying to treat 
> "opinions about a name" as a simple data element linked to a 
> name, and you'll see one of the things that worries me about 
> this particular approach:
> 
> Pat Curator, who has just taken a position at a small 
> institution whose collection hasn't been curated in 50 years, 
> wishes to organize the taxa (plants, animals, whatever), and 
> submits a set of queries designed to give a listing of taxa 
> such that all synonyms are listed under each species, each 
> species is in a genus, each genus is in a family, and each 
> family is in an order. Not too much to ask, right?
> But if the "opinions" data is not used to build a 
> classification, poor Pat is going to find that a substantial 
> number of species-level taxon names will (a) have more than 
> one alternative opinion whether they are synonyms or not, (b) 
> have more than one alternative opinion what genus they belong 
> to, (c) have more than one alternative opinion what family 
> the genus belongs to, and (d) have more than one alternative 
> opinion what families are in each order. Pat might need to 
> sit down and plow through all the literature at each level of 
> the hierarchy, for anywhere up to thousands of taxa, in order 
> to arrive at a single functional classification. Let's face 
> it: MOST users of names aren't themselves experts, and aren't 
> equipped to decide for themselves whether to follow, say, 
> Opus' 1933 standard versus Soenso's 1996 cladistic analysis 
> versus Whozis' 2002 molecular phylogeny.

Having lived in essentially Pat's shoes for more than a decade, I know
these issues very well.  Indeed, these are the issues that have shaped
my own approach to solving the related information management issues
(http://www.phyloinformatics.org/pdf/1.pdf).  In a nutshell:  specimens
have been identified to a taxon name by some (known or unknown)
determiner. When the determiner assigned the name to the specimen (or
perhaps more correctly, assigned the specimen to the name), the
determiner had in mind a taxonomic concept associated with that name,
within the scope/circumscription of which the specimen belonged.  Thus,
the link between a specimen and a name comes via a concept.

When you have a data structure that tracks taxonomic concepts that are
rooted in names, and you link the specimens to the concepts, many of
these problems disappear (given a decent body of concept mapping, which
doesn't really exist yet, but which the SEEK folks and others are
working hard on, and which will exist in our professional lifetimes).
The specimen maintains the name that appears on its label, SEC the known
or unknown person who identified it as such.  That establishes a link to
what I call a "Protonym" (basically the Basionym).  Given the Basionym,
you then would have access to the current status of that Basionym SEC
any number of sources (the Museum's curator, any number of publications,
ITIS, Species 2000, Catalog of Fishes, IPNI, KEW, whatever...).  Thus,
the specimen can be located no matter whose concept you choose to
follow.

> I maintain that what may seem perfectly logical, objective, 
> and ideal to those who are designing such names databases - 
> to wit, the "hands-off" approach to opinions (and, 
> effectively, all classificatory matters) - stands to leave a 
> LOT of nomenclatural data almost completely worthless to a 
> large community of potential users.

I think quite the opposite -- but I suspect I'm simply not understanding
your point.

> Part of the "sociology" of the taxonomic community that needs 
> to be developed alongside names databases is a working 
> consensus classification, and - more to the point - this 
> needs to be an
> *interactive* and *ongoing* portion of the data resource, and 
> not based solely on published classificatory opinions (I 
> believe that there is much more authoritative taxonomic 
> opinion that resides in the heads of living taxonomists than 
> can ever be published, and if we stick with the tradition 
> that nothing is of any merit until it is printed on paper, we 
> are crippling ourselves). 

I agree with the point that documentation of concepts (opinions) should
not be restricted to publications only. But I'm not sure there needs to
be any proactive effort to move towards a consensus classification.  I
think the differences of opinion should be embraced (not discouraged),
and eventually a consensus will emerge on its own.  In cases where
"reasonable people may disagree", you simply maintain who thinks what
about each name.

> Without a resource that puts forth 
> a consensus opinion to all users (with appropriate caveats, 
> obviously), we are failing to address one of the most serious 
> objections that the critics of taxonomy are (and have been)
> making: that you can almost never get a straight answer from 
> a taxonomist when you want to know what the name is for 
> something and where it fits in the classification. I 
> understand why GBIF wants to steer clear of it, and "not take 
> sides" when compiling a list of all taxon names, but we can't 
> neglect the very real need to
> *simultaneously* develop a single authoritative 
> classification framework into which those names will fit. 

I can see both needs being accomodated by the same system.  For the
taxonomist, all of the historical alternate opinions can be resented
right up front.  For the educated non-taxonomist, a distilled set of the
leading current alternatives could be presented -- I imagine some sort
of automatic algorithm analagous to Google's PageRank approach that uses
various criteria to "rank" each alternate according to how many others
have followed it and how recently, and maybe a bunch of other factors as
well.  Then, for the layman, there could be the equivalent of Google's
"I'm Feeling Lucky" option. None of this would require any proactive
effort to converge on taxonomic consensuses (is that a word? maybe
"consensi"?), nor any major changes in the sociology of taxonomists
(other than getting them to be willing to document their opionons
electronically).

> Name compilation efforts undertaken in a classificatory 
> vacuum aren't going to serve the larger user community to the 
> degree needed, and I think a greater effort should be made to 
> develop a mechanism by which taxonomists can arrive at a 
> public consensus.

I don't think the modern names lists are being created in a
classificatory vacuum.  Usually there are at least two alternatives
considered (the original, and the current).  Eschmeyer's Catalog of
Fishes usually provides a bunch of historical alternative uses, *and* he
provides his own "consensus" perspective as well.

> This would not require much more than what Roger mentioned 
> above, and what Rich and I have also proposed in the past: 
> "news feeds and watch lists so you can be notified of new 
> entries in the categories you are working on". This can be 
> applied to classificatory matters just as well as to purely 
> nomenclatural ones. 

Absolutely!

> A single central website can accomplish 
> this - the software and logistics are, in fact, trivial.
> Beyond funding, I see no serious obstacles other than 
> tradition and egotism preventing us from building a single 
> Tree of Life - with each taxonomist choosing the branches 
> therein upon which they will focus their efforts. With 
> active, day-to-day participation by everyone in the taxonomic 
> community, this could easily become a reality.

Other than a push towards a "Single Tree of Life", we are (as usual) in
complete agreement.  I understand what you mean in terms of the need for
non-specialists to have just one "correct" answer, and I think that a
taxonomic algorithm for "I'm Feeling Lucky" could be objectively
developed.  The way I imagine it, the Museum specimens can all keep
their names as the label currently indicates (even if that label is 100
years old), and the name can be translated, in real time, any time, to
the current "I'm Feeling Lucky" name (or to the current ITIS name, or to
the current IPNI name, or to the current Eschmeyer name, or to whatever
authority's name that the curator wants to follow by default).  All of
this translation would happen in real time, at the Curator's computer,
whenever a query is run or a search is made.  It would be an on-call web
service, and as Doug already pointed out, the IT aspects of it are
relatively trivial. The Museum label would only change when someone
actually examines the specimen, and asserts a new determination for it.

We can all have dreams, can't we?

Aloha,
Rich

=======================================================
Richard L. Pyle, PhD
Ichthyology, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://www.bishopmuseum.org/bishop/HBS/pylerichard.html