[Taxacom] saturday morning fun
Stephen Thorpe
stephen_thorpe at yahoo.co.nz
Sun Nov 28 18:06:18 CST 2010
I suggest that the broader problem is that aggregators like GBIF want to be able
to boast big numbers of data, which requires that data processing is mostly done
automatically, without intervention of a thinking human mind. The problem with
this is that the data providers can be very raw, incomplete and/or unreliable
(the aggregator is seemingly blind to this, going only on trust of the provider
authority), and SUBJECTIVE, the subjectivities of different providers just don't
"add up" to a consistent overall result without lots of human intervention to
make sensible and sometimes arbitrary decisions and judgement calls, and this
step would stand in the way of the "big numbers" of data desired. But the bit
that really gets my blood boiling is the implied validation of "accepted" data
in GBIF, or whoever, when a significant proportion of it is incomplete and/or
just plain wrong, and there is a huge beauracracy standing in the way of fixing
anything. Still ... I guess it keeps people in paid employment and money
circulating through the system ...
Stephen
________________________________
From: Geoffrey Read <gread at actrix.gen.nz>
To: taxacom at mailman.nhm.ku.edu
Sent: Mon, 29 November, 2010 12:13:14 PM
Subject: Re: [Taxacom] saturday morning fun
The original classification of each entry supplied to GBIF should be
trusted more by them. Why would one ever allow shoehorning records
presented as under Phylum Chordata (and coming from a specialist bird
database) into Phylum Arthropoda? It is not the job of GBIF to second
guess every entry in every museum database on earth, or to resolve
inadequate original data. The providers need to do that. In the case of
the mussel variants (I grant there are probably very much less clear cut
situations) a human could quickly tell that every entry was the same
species. I'm pretty sure a well trained computer could too.
A nice approach at species level (the most important level) would be to
actually show users a table of the variants like the Mytilus edulis
example and let them build their own selection of entries to use in their
query. A select-all button would work fine in that particular case. The
our-algorithms-know-best-what-you-want-and-here-it-is-in-one-big-pureed-data-mix
approach is dismal.
Geoff
>>> On 29/11/2010 at 6:28 a.m., "David Remsen (GBIF)" <dremsen at gbif.org>
wrote:
> 6. Here is a sample entry for a single species (Mytilus edulis, the
> common blue mussel):
> http://code.google.com/p/gbif‑ecat/wiki/Nom5ExampleMytilusedulis
_______________________________________________
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
The Taxacom archive going back to 1992 may be searched with either of these
methods:
(1) http://taxacom.markmail.org
Or (2) a Google search specified as: site:mailman.nhm.ku.edu/pipermail/taxacom
your search terms here
More information about the Taxacom
mailing list