[Taxacom] a looming data conflict crisis in bioinformatics?

Stephen Thorpe stephen_thorpe at yahoo.co.nz
Sat Nov 20 16:43:59 CST 2010


Orville, you will never EVER convince the world that a flying machine is 
even remotely possible ...

>if 90% of the raw data in question are incorrect or fraudulent

well, 90% is a bit of an exaggeration, and it is not so much "incorrect or 
fraudulent", but just unverifiable and based on trust (and not easily flagged as 
problematic in closed edit databases)

anyway, the wiki point I made was rather secondary, the main point being that 
for checklists and things based on trust, it is just too easy to come up with 
implausible ad hoc ways of reinterpreting bad data as if it were good data

for example, in a certain current checklist which will be a data provider for 
serious "official" databases and feed into GBIF, et., etc., the author lists 
Didymocantha flavopicta McKeown, 1948, without comment, as an endemic N.Z. 
species, and fails to mention the endemic D. picta Bates, 1874. Actually, 
Didymocantha flavopicta McKeown, 1949 was a replacement name for the Australian 
species D. picta McKeown, 1948, which had never previously been reported from 
N.Z. These are the facts of the case, but what are we to conclude? It is obvious 
to me that the author stuffed up and thought D. flavopicta was a replacement 
name for the N.Z. species, but can I prove it absolutely?? No. There are always 
some remotely possible ad hoc ways to save the author here ... maybe the type of 
D. picta Bates is in fact some other N.Z. cerambycid to what it has been thought 
until now to be (and a junior synonym thereof), and the species hitherto called 
D. picta Bates in N.Z. is in fact the species now called D. flavopicta, whose 
type is a mislabelled N.Z. specimen, not from Australia at all, and the 
Australian D. flavopicta sensu all previous authors is something else entirely! 
An extreme example perhaps, but illustrative of a general point. For primary 
taxonomy, it may never be possible to always require verifiability, but 
secondary checklists lacking verifiability are extremely problematic and 
unnecessary, but they keep comin' ...

Stephen




________________________________
From: Doug Yanega <dyanega at ucr.edu>
To: TAXACOM at MAILMAN.NHM.KU.EDU
Sent: Sat, 20 November, 2010 10:34:28 PM
Subject: Re: [Taxacom] a looming data conflict crisis in bioinformatics?

Paul Kirk wrote:

>Stephen,
>
>You will never, ever, convince anyone that the future of 
>biodiversity information management is by using the 'wiki system' - 
>nothing more that a digital equivalent of a piece of paper available 
>on the internet. If you need convincing, listen to the inventor of 
>the web at the TED 
>http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html and 
>let us all know why you think he is wrong this time.

I think I can anticipate Stephen's response here, and the point is simple:

Insisting that all we need is more raw data is meaningless if 90% of 
the raw data in question are incorrect or fraudulent. The end result 
is going to be awfully, awfully confusing.

"The trouble with the world is not that people know too little, but 
that they know so many things that ain't so." - Mark Twain

Just consider the battle of two memes:

"Obama is a Muslim" gets 1,090,000 Google hits, but
"Obama is not a Muslim" gets only 89,000.

When the truth is swallowed up by lies, letting some computer 
algorithm tell you what to believe on the internet is just asking for 
trouble. I'm not so sure Tim is thinking clearly here, unless he can 
devise an algorithm that can infallibly detect lies. And, much as you 
might hate to admit it, Wikis are very good at filtering out liars, 
ignoramuses, and crackpots - and the more people that contribute, the 
better that filtering becomes. If you don't believe that, and think 
that wikis are "nothing more that [sic] a digital equivalent of a 
piece of paper" then you really, truly do NOT understand how wikis 
work.

Sincerely,
-- 

Doug Yanega        Dept. of Entomology        Entomology Research Museum
Univ. of California, Riverside, CA 92521-0314        skype: dyanega
phone: (951) 827-4315 (standard disclaimer: opinions are mine, not UCR's)
              http://cache.ucr.edu/~heraty/yanega.html
  "There are some enterprises in which a careful disorderliness
        is the true method" - Herman Melville, Moby Dick, Chap. 82

_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these 
methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  
your search terms here



      


More information about the Taxacom mailing list