[Taxacom] concept mapping (was Re: progress on globalnames.org)

Fri May 15 11:25:39 CDT 2009

Just a quick comment on the limits of concept mapping:

David Remsen wrote:

>I can imagine for 
>example a tool like EoLs new refactoring of the uBio LinkIT (called 
>MarkIT) that could make it really easy to find a name in a document or 
>on a label, call a taxonomic catalog to find all treatments of that 
>name (having dealiased the orthographic and nomenclatural bits) and 
>provide a select list of relevant concepts that embed the taxon
>identifier into the usage. <snip>

>btw,  I suspect there are many cases where a common name is more 
>stable than a scientific name.  I think scientific names are more 
>precise because there are bodies governing their creation and use and
>they are ultimately tied to something real (types).

This all apparently assumes that the scientific names in question are 
at generic rank or below; above that, the names are not so precise, 
nor particularly stable. However, at that level, common names are 
also very unstable as well, if only because membership in a group to 
which a common name is applied changes over time, sometimes 
drastically (I can think of dozens of insect families that were 
either MUCH larger or MUCH smaller only 30 years ago - meaning both 
the scientific and common name concepts have changed). Since we're 
evidently talking about software that will recognize ANY rank of 
scientific or common name, this isn't just a side-issue, correct?
Bear in mind, too, that many common names *sometimes* refer to a 
single species, and *other* times the same name refers to a much 
larger group - sometimes a group of things which aren't necessarily 
related to one another (e.g. "daddy long-legs" or "cicada killer" or 
extreme examples: "the wasp" or "the bee"). Some common names are 
truly hopeless to concept-map, such as "bugs". Trying to get an 
algorithm to do concept-mapping is inevitably going to run afoul of 
MANY cases like these; cases where only a human being could possibly 
know how to disambiguate a printed name, if only because an algorithm 
has no way of establishing *context*. Is an algorithm that gives an 
unambiguous result for only 50% (probably less) of all printed 
organism names really - ultimately - going to be sufficient for what 
we hope to accomplish? Speaking practically, who will sit down, read 
all those papers, do the detective work, and disambiguate the 
remaining 50% - and who will pay them to do this? Or are we limiting 
the scope by excluding from consideration all documents that are not 
classified as primary scientific literature?

Peace,
-- 

Doug Yanega        Dept. of Entomology         Entomology Research Museum
Univ. of California, Riverside, CA 92521-0314        skype: dyanega
phone: (951) 827-4315 (standard disclaimer: opinions are mine, not UCR's)
              http://cache.ucr.edu/~heraty/yanega.html
   "There are some enterprises in which a careful disorderliness
         is the true method" - Herman Melville, Moby Dick, Chap. 82