[Taxacom] Occurrence data...

Fri Feb 18 12:39:07 CST 2011

Rich,
the problem, in my point of view, and maybe that's what Bob was saying, is
the focus on making data fit for the machine while the users' need to look
behind the data output is more or less neglected. My impression is that the
current TDWG recommendations for data standards are week in representing the
authentic original information in the databases. More or less, we must trust
in what databasers (often not those who have created the information) are
digitizing. For example, in most insect occurrence data accessible through
GBIF and marked as "specimen"-based, I cannot make out what's on the
original specimen labels. Both the geographic information and the original
taxon identification are already atomized and transformed into
machine-digestible text strings in the first steps of digitization.

The current, really incredible error load in the data accessible through
GBIF, EOL, etc. should be alarming enough to make us re-consider some
procedures, IMHO.

As for occurrence data existing in literature, doesn't BHL already offer a
better alternative?
Just link to a journal page, e.g.
http://biodiversitylibrary.org/page/28994598
There you pick up the chresonym "Tachys bisulcatus (Nicolai)" and the
verbatim locality name "Marquartstein". Then, in separate data fields,
georeference the place as 47.755/12.464, and get 'Porotachys bisulcatus' as
a better name for the species if you prefer a current classification . With
these data elements and the link to BHL we have a vettable occurrence
record. What else do we need?

For specimen based records, we can do this: With a simple set of controlled
vocabulary (e.g., "\" for the beginning of new text lines) we can enter
verbatim text taken from locality and ID-labels in a first step (rapid data
entry) and add, in separate fields, interpretations like lat/lon &
standardized name strings.
Always let the user know what is original and what is interpretation in the
process of digitization.

Cheers,
Wolfgang

----------------------------------------------

Wolfgang Lorenz, Tutzing, Germany

2011/2/18 Richard Pyle <deepreef at bishopmuseum.org>

> But don't you have to know that an original source exists?  If you know
> exactly what you're looking for, and you know exactly where to go to get
> it,
> then there's no problem.  The role of aggregators is to provide a single
> portal that INDEXES all the original sources, and provides tools to FIND
> the
> stuff you're interested in, and then provide links BACK to the original
> source for more detailed information.  They also can do some value-added
> stuff like show aggregated points on a map.  Sure, a lot of that is bogus
> data; but it's more likely to be obviously so when placed in the context of
> good data.  If a collection has two specimens of "Aus bus"; one from
> California, and one from Florida, then a non-specialist will assume that
> the
> species has a broad distribution.  If those two data points are put on a
> map
> alongside 1000 other datapoints, 990 of which are clustered in California,
> and the remaining ones are scattered hither and yon, you might be inclined
> to re-examine the scattered ones to see if they may be mis-labeled,
> mis-digitized, or mis-identified.
>
> So, while you look at GBIF and see a godawful mess, I look at GBIF and see
> it doing what an aggregator does best -- providing a single portal to
> simultaneously access large amounts of data from distributed sources, with
> useful services to see these data in aggregate form (which includes drawing
> attention to erroneous data in the source databases).
>
> Having access to source data, *and* accessing aggregators that provide
> relevant services, are not mutually exclusive things.  And, furthermore, as
> you said very well in your later post: "on the Web, those don't have to be
> the only 2 choices".
>
> Aloha,
> Rich
>
> > -----Original Message-----
> > From: taxacom-bounces at mailman.nhm.ku.edu [mailto:taxacom-
> > bounces at mailman.nhm.ku.edu] On Behalf Of Bob Mesibov
> > Sent: Thursday, February 17, 2011 8:28 PM
> > To: TAXACOM
> > Subject: Re: [Taxacom] Occurrence data...
> >
> > Hi, Ken.
> >
> > I think you're missing my point. I can think of a lot of uses for data,
> but I want
> > to get those data directly from their sources, not from the godawful mess
> > that GBIF has created. That's me as user. Now who is going to want to go
> the
> > GBIF route, and why?
> >
> > Regards,
> > Bob
> > --
> > Dr Robert Mesibov
> > Honorary Research Associate
> > Queen Victoria Museum and Art Gallery, and School of Zoology, University
> of
> > Tasmania Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
> > Ph: (03) 64371195; 61 3 64371195
> > Webpage: http://www.qvmag.tas.gov.au/?articleID=570
> >
> > _______________________________________________
> >
> > Taxacom Mailing List
> > Taxacom at mailman.nhm.ku.edu
> > http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> >
> > The Taxacom archive going back to 1992 may be searched with either of
> > these methods:
> >
> > (1) http://taxacom.markmail.org
> >
> > Or (2) a Google search specified as:
> > site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>
>
>
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either of these
> methods:
>
> (1) http://taxacom.markmail.org
>
> Or (2) a Google search specified as:  site:
> mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>