[Taxacom] the hurdle for all biodiv informatics initiatives

Thu Feb 18 03:09:27 CST 2010

From: "Richard Pyle" <deepreef at bishopmuseum.org>
Sent: Thursday, February 18, 2010 2:02 AM

>Wolfgang Lorenz wrote:
>> What we can make out, at this early stage, is the taxonomic
>> names problem as probably THE major hurdle for all those
>> projects! [GBIF, EoL, CoL, etc.]

> I couldn't agree more!!!

***
An admirably accurate summation of the debate!

Taxonomy, and taxon names are the SOLUTION to unlocking
the world's biodiversity. Any worthwhile indexing effort will
utilize this solution and build on it. Instead, 'bioinformatics' is
regarding it as the PROBLEM and is generating lots and lots
of output with the actual biodiversity getting hidden further and
further away. Actually, I do not see a real difference between
indexing "text strings" and indexing the fonts used to print the
text string (or for that matter, indexing the type of ink used to
print the text string). There are endless possible "text strings"
out there (an ever growing number) and these can be indexed
till hell freezes over, without necessarily achieving anything.
* * *

>> Why do we need such identifiers and who can take control of it???
>> Instead of machine-only-readable identifiers, which are
>> obviously "out of human control" in so many examples, we
>> could have perfectly stable, unique and readable Name Strings
>> for each available name, registered and resolvable in a
>> future ZooBank:
>>
>> ZS-Feronia_sodalis
>> ZS-Feronia_sodalis/Eumolops_sodalis
>> ZS-Feronia_sodalis/Evarthrus_sodalis
>> ZS-Feronia_sodalis/Pterostichus_sodalis
>> ZS-Feronia_sodalis/Cyclotrachelus_sodalis
>> ZS-Feronia_sodalis/Abax_sodalis

> This approach seems very sensible to anyone who who has never needed to
> sort out homonyms.  If you add author and year, you can reduce the problem
> of homonymy, but you increase the problem of establishing a standard way
> of representing author & year data. [...]

> I don't think you're necessarily missing anything.  We already have a
> human-friendly way of displaying a taxon name; e.g. "Feronia sodalis
> LeConte 1848".  That's worked reasonably well for human brains for the
> past couple of centuries.  The only reason we're having a conversation
> about GUIDs is that -- as has been proven from decades of experience
> trying to use computer databases as tools to manage taxonomic
> information -- these sorts of human-friendly identifiers are not well
> suited to establishing resolvable links among digitized information.  The
> exceptions homonyms, mis-spellings, etc.) seem rare to us, but even
> low-frequency exceptions to general rules create all sorts of confusion
> and complexity when we try to build robust computer databases as tools for
> taxonomists. [...]

***
I agree with the reasoning: computer-readable output for computers.
What does not make sense is to convert "identical text strings" that refer
to different entities into identical computer output. This is just a way to
multiply confusion. Text strings are a red herring. Why not index what
actually matters?

Paul