[Taxacom] data quality vs. data security: a survey

Richard Pyle deepreef at bishopmuseum.org
Sat Feb 13 15:50:31 CST 2010


Hi Stephen,

I guess we just have different perspectives of where the "serious"
duplication-of-effort problems are in our community. 

Speaking as a hard-working, dedicated taxonomist myself (when I can find the
time...), I want to make sure that any contribution I make to taxonomy is
maximally available to all current and future interested parties.  You and I
just seem to have different ideas about how best to achieve that goal.

Aloha,
Rich

> -----Original Message-----
> From: Stephen Thorpe [mailto:s.thorpe at auckland.ac.nz] 
> Sent: Saturday, February 13, 2010 11:45 AM
> To: Richard Pyle; 'TAXACOM'
> Subject: RE: [Taxacom] data quality vs. data security: a survey
> 
> Hi Rich,
> 
> No, I don't buy it! 
> 
> >Everytime information about a species, a taxonomic publication 
> >citation, etc., etc. is typed by humans on a keyboard (whether it be 
> >typed into a manuscrapt, a database, a wikispecies page, or 
> wherever), 
> >that's duplication of effort. Individually, it seems trivial 
> -- but in 
> >aggregate it is most certainly *not* trivial
> 
> First off, if someone types a citation into a wikispecies 
> page, it may in some sense be a duplication of effort if 
> someone else has already typed it into something else, or an 
> "acronym" or ten have already "harvested" it, but since it 
> was typed into wikispecies free of charge, it isn't a SERIOUS 
> duplication of effort (on the part of the wikispecies 
> contributor). What is a SERIOUS duplication of effort is when 
> science funding goes individually to several different 
> aggregators to each put the citation in their own particular 
> database, and even worse when all they are in fact doing is 
> "harvesting" the information from an existing taxon specific 
> database. The aggregators are merely parasites ...
> 
> >While there is certainly some overlap among them, the 
> duplication is by 
> >no means "massive".  To say so reveals a poor understanding 
> about what 
> >these different initiatives actually do
> 
> I may not know what they do (behind the scenes), but I know 
> what they give the end user, in terms of content, and it just 
> isn't very much at all, at least for GBIF, EOL, COL, and the 
> like. All they do is "harvest" names and create stubs. I 
> don't want a nice looking map of the world on a species page 
> if there are no points plotted on it, or if there are so few 
> points plotted compared to the actual distribution. How 
> "massive" is "massive", in terms of overlap?
> 
> >You seem to be confusing "Aggregation" with "Integration".  
> Google is 
> >an aggregator (an indexer, really -- like GBIF)
> 
> OK, so why do we need GBIF, when we already have Google? I am 
> NOT, obviously, saying that Google is sufficient for all our 
> needs - far from it! I am saying that an expensive entity 
> like GBIF is not much better than Google.
> 
> This seems to be what is going on: dedicated taxonomists 
> (like Bob, for example) work darn hard for relatively little 
> reward, creating new taxonomic knowledge. Then, if you are 
> lucky, that knowledge gets integrated into either a taxon 
> specific database, and/or (if I have anything to do with it) 
> Wikispecies. So far, so good. It is what happens next that is 
> the problem! Increasing numbers of "parasites" then make far 
> more money and have a far easier life than Bob by 
> "harvesting" the names from the taxon specific databases, and 
> creating skeleton pages on some site that promises so much, 
> but never seems to end up delivering much in terms of actual 
> content! If you could get actual useful content out of these 
> sites, then fine, but all too often you just find a map 
> devoid of points, and a page devoid of content!
> 
> Cheers,
> 
> Stephen
> 
> ________________________________________
> From: taxacom-bounces at mailman.nhm.ku.edu 
> [taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Richard 
> Pyle [deepreef at bishopmuseum.org]
> Sent: Sunday, 14 February 2010 7:47 a.m.
> To: 'TAXACOM'
> Subject: Re: [Taxacom] data quality vs. data security: a survey
> 
> Hi Stephen,
> 
> > OMG! Did you really just say that! How is a massive duplication of 
> > effort increasingly allowing a massive reduction of 
> > redundant/duplicate effort????????
> 
> It appears you didn't understand my post.  As you say, 
> "communication is a very difficult thing, particularly on 
> topics as complex as this", so I'll try again.  You seem to 
> characterize all the various large-scale data aggregators 
> (GBIF, EOL, COL, ALA, etc.) as "massive duplication of effort".
> While there is certainly some overlap among them, the 
> duplication is by no means "massive".  To say so reveals a 
> poor understanding about what these different initiatives actually do.
> 
> Everytime information about a species, a taxonomic 
> publication citation, etc., etc. is typed by humans on a 
> keyboard (whether it be typed into a manuscrapt, a database, 
> a wikispecies page, or wherever), that's duplication of 
> effort. Individually, it seems trivial -- but in aggregate it 
> is most certainly *not* trivial.
> 
> > INTEGRATION is one thing, but MULTIPLE INTEGRATION 
> INITIATIVES leading 
> > to numerous clone or near clone integrated databases is completely 
> > self-defeating!
> 
> You seem to be confusing "Aggregation" with "Integration".  
> Google is an aggregator (an indexer, really -- like GBIF).  
> The DNS system is an architecture for integration.  The 
> equivalent of DNS for biodiversity information is what I mean 
> by integration.
> 
> Aloha,
> Rich
> 
> 
> 
> _______________________________________________
> 
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> 
> The Taxacom archive going back to 1992 may be searched with 
> either of these methods:
> 
> (1) http://taxacom.markmail.org
> 
> Or (2) a Google search specified as:  
> site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here






More information about the Taxacom mailing list