[Taxacom] the hurdle for all biodiv informatics initiatives

Thu Feb 18 08:05:59 CST 2010

From: "Richard Pyle" <deepreef at bishopmuseum.org>
Sent: Thursday, February 18, 2010 10:46 AM

> Ummmm...who, exactly, is saying that taxonomy is the Problem?  I guess if
> you mean "bioinformatics" sensu stricto (i.e., in the sense that the
> molecular/DNA people have copted that term as their own); then yes -- some
> of those people think that taxonomy can be replaced by genetic markers
> (like DNA barcodes).  And in that sense, I am fully in agreement with you.
> But my understanding is that this conversation was about "bioinformatics"
> sensu lato (i.e., what we now refer to as "biodiversity informatics").
> This is the space where we find the "Axis of Evil" (GBIF, ALA, EoL,
> CoL, etc.) --  and the things they are doing are highly supportive of
> traditional taxonomy, and they all see taxon names as the SOLUTION
> to integrating the information.

***
OK, "biodiversity informatics" it is (BTW, when I took a course in
bioinformatics it meant something different entirely from what is sketched
above. There are a lot of terms that are confusing!)
* * *

>> Actually, I do not see a real difference between indexing
>> "text strings" and indexing the fonts used to print the text
>> string (or for that matter, indexing the type of ink used to
>> print the text string). There are endless possible "text strings"
>> out there (an ever growing number) and these can be indexed
>> till hell freezes over, without necessarily achieving anything.
>
> ...unless, of course, we can build an infrastructure that goes beyond the
> text strings and cross-links data through GUIDs (which humans never see).

***
But how much more effective it would be if this was skipped,
and instead an infrastructure was built with actual information
(names, types, circumscriptions), with the text strings left to be
handled by a trivial algorithm to be designed for the purpose!
* * *

> As I explained in our off-list exchange, the text-string indexers do not
> see the text strings as the "ends", they see them as the only thing we
> currently have to build the connections.  Once those connections
> (via GUID links) are established, then the text strings become consistent,
> stop multiplying needlessly, and (ultimately) the text-string indexers
> will no longer have a service to perform.

***
Text strings will never become consistent, nor stop multiplying.
Not unless humans are excluded from the entire process.
* * *

>> I agree with the reasoning: computer-readable output for computers.
>> What does not make sense is to convert "identical text
>> strings" that refer to different entities into identical
>> computer output. This is just a way to multiply confusion.
>> Text strings are a red herring. Why not index what actually matters?
>
> Indeed -- that's *exactly* what we're trying to do (see my previous post).
> Like I said, whether we like it or not, the myriad text strings already
> exist, and are our only link in some cases to important information about
> biodiversity.  To link that information to the "clean" names, we need the
> text-string indexers to help us get there.

***
Actually, I do not see that the "myriad text strings... are our only link
... to important information about biodiversity" nor that they would be
sufficient to access all the information. They are just what the
"biodiversity informatics" people are dealing with.

Paul