[Taxacom] saturday morning fun

Stephen Thorpe stephen_thorpe at yahoo.co.nz
Sun Nov 28 18:06:18 CST 2010


I suggest that the broader problem is that aggregators like GBIF want to be able 
to boast big numbers of data, which requires that data processing is mostly done 
automatically, without intervention of a thinking human mind. The problem with 
this is that the data providers can be very raw, incomplete and/or unreliable 
(the aggregator is seemingly blind to this, going only on trust of the provider 
authority), and SUBJECTIVE, the subjectivities of different providers just don't 
"add up" to a consistent overall result without lots of human intervention to 
make sensible and sometimes arbitrary decisions and judgement calls, and this 
step would stand in the way of the "big numbers" of data desired. But the bit 
that really gets my blood boiling is the implied validation of "accepted" data 
in GBIF, or whoever, when a significant proportion of it is incomplete and/or 
just plain wrong, and there is a huge beauracracy standing in the way of fixing 
anything. Still ... I guess it keeps people in paid employment and money 
circulating through the system ...

Stephen




________________________________
From: Geoffrey Read <gread at actrix.gen.nz>
To: taxacom at mailman.nhm.ku.edu
Sent: Mon, 29 November, 2010 12:13:14 PM
Subject: Re: [Taxacom] saturday morning fun

The original classification of each entry supplied to GBIF should be
trusted more by them. Why would one ever allow shoehorning  records
presented as under Phylum Chordata (and coming from a specialist bird
database) into Phylum Arthropoda? It is not the job of GBIF to second
guess every entry in every museum database on earth, or to resolve
inadequate original data. The providers need to do that. In the case of
the mussel variants (I grant there are probably very much less clear cut
situations) a human could quickly tell that every entry was the same
species. I'm pretty sure a well trained computer could too.

A nice approach at species level (the most important level) would be to
actually show users a table of the variants like the Mytilus edulis
example and let them build their own selection of entries to use in their
query. A select-all button would work fine in that particular case. The
our-algorithms-know-best-what-you-want-and-here-it-is-in-one-big-pureed-data-mix
approach is dismal.

Geoff


>>> On 29/11/2010 at 6:28 a.m., "David Remsen (GBIF)" <dremsen at gbif.org>
wrote:
> 6.  Here is a sample entry for a single species (Mytilus edulis, the
> common blue mussel):
>        http://code.google.com/p/gbif‑ecat/wiki/Nom5ExampleMytilusedulis



_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these 
methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  
your search terms here



      


More information about the Taxacom mailing list