[Taxacom] Occurrence data...

Fri Feb 18 17:00:16 CST 2011

and for a *really radical* suggestion from me: why not let taxonomists work out 
the occurrence data and publish it with the relevant taxonomic revision? That 
way, we get data attached to robustly revised taxa, and the taxonomist can 
discover any likely mislabellings, etc., by the way that they stand out as 
outliers relative to the aforementioned robust taxonomic revision.

perhaps, though, what we want is all the occurrence data for all taxa (revised 
and unrevised, mislabelled or not) in one giant "Christmas present" ... yeah, 
right!

Stephen

________________________________
From: Bob Mesibov <mesibov at southcom.com.au>
To: L Penev <lyubo.penev at gmail.com>
Cc: TAXACOM <taxacom at mailman.nhm.ku.edu>
Sent: Sat, 19 February, 2011 11:42:51 AM
Subject: Re: [Taxacom] Occurrence data...

Dear Lyubo,

It sounds like your response to my comment

"A barrier to be overcome if DCAs are to appear more often in publications is 
that most data creators are either unfamiliar with the TDWG scheme for 
classifying and formatting data items, or are unwilling to spend time working 
out how their own preferred data fields relate to that scheme."

is

"Naturally, we are aware that at the present stage DwC-A would in many cases 
need some support from experienced data managers to be properly implemented. It 
will take some time. On the other side, the future comes often faster than 
anyone would expect.  Data managers become quickly wanted job positions even in 
not that large taxonomic institutions. Individual taxonomist will be facilitated 
by tools to export their datasets in DwC-A or in another interoperable formats."

But this avoids the questions: is it necessary? is it even desirable? ZooKeys 
already semantically marks up the text and assigns the all-important LSIDs. You 
are now encouraging authors to go to the next stage, and structure their raw 
occurrence and nomenclatural data. How long will it be before you ask authors to 
digitally map their images, so that some aggregator ('Encyclopedia of 
Morphology' project) can pull up all the hind-leg tarsus image-elements in the 
digitised insect literature?

I am concerned that what is happening is flawed at two levels. First and 
foremost, there is a legacy feeling from the days of libraries, when you could 
create a single authoritative index and it would sit on a shelf in the Reference 
section, and it was the first place you went as an introduction to a topic. You 
can still find such things on the Web: lists of links, generally way out of 
date. There is far too much information on the Web to make this viable, there 
are too many data quality issues and updating is haphazard. The alternative is 
to let software find things for you - the Rod Page approach - so that there are 
as many indexes and compendia as there are occasions on which someone goes 
data-hunting. And to link (or allow software to link for you) and link again, 
until you have a densely interconnected network of data sources to facilitate 
that data-hunting.

The second level is that even today, 20 years into the new age, promoters of 
Gigantic All-Encompassing Biodiversity Databases (and indexes, Rich) still have 
no clear idea who wants the information and for what purposes. If I ask that 
question I sometimes get the sincere but vacuous answer that we don't know and 
it isn't important, the important thing is to have the data ready when someone, 
somewhere, wants it for some purpose. I can't think of any other major human 
enterprise that tolerates such vagueness in its aims.

The many bottom-up biodiversity databases on the Web typically have an audience 
in mind, namely the specialists who contribute to their creation, and who are 
the primary users of the data. They've been structured for those users, built 
with careful attention to detail, and can be 'handed down' from volunteer 
specialist to volunteer specialist, with some confidence that the same general 
aims and devotion will also be handed down. I don't think you could say that for 
any of the aggregation projects.

I see these bottom-up resources as high-use nodes in the future networks of 
linked biodiversity data. Their contents don't need to be aggregated, indexed, 
repackaged or otherwise fooled with. They can be accessed directly in an 
anarchic, unstructured Web. Like Pete DeVries, I don't see any good reason why 
the same can't be true for raw data. If raw data is made available this way, as 
in ZooKeys supplements, I'd prefer it *wasn't* marked up, so that I - as *user*, 
not aggregator - can pass an eye over it a la Chris Thompson.

Rich Pyle wrote (as I was writing the above):

"Criticize aggregators all you want, but one thing that they certainly *can* 
help with is in eliminating a lot of redundant effort."

Effort by whom? For what purpose? Do you really expect or want to have the 
background on every RCL Perkins collection in Hawaii and every other collector 
in every other place on Earth in another gigantic index-on-the-shelf? With no 
errors? How about just putting on the Web the individual results of careful 
scholarship and allowing *users* to find them through linking? Isn't the aim to 
connect user with datum, not to keep programmers and data managers employed?
-- 
Dr Robert Mesibov
Honorary Research Associate
Queen Victoria Museum and Art Gallery, and
School of Zoology, University of Tasmania
Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
Ph: (03) 64371195; 61 3 64371195
Webpage: http://www.qvmag.tas.gov.au/?articleID=570

_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these 
methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  
your search terms here