[Taxacom] saturday morning fun

Mon Nov 29 15:34:12 CST 2010

David, 

> I agree that it should be easier to make simple changes to the organisational 
>structure behind the index but it's just not that simple at scale.   We are 
>exploring ways to enable such annotations in fact

well, it *might* be difficult to fix problems, but it should be easy to flag 
them. For example, on Rod Page's Biostor, each page has a comment posting 
facility, so if someone spots a problem on the page, they can point it out in a 
way that remains visible on that page ...

>You lost me here.   I thought YOU were the person complaining about data 
>quality?

I am complaining about data quality (or lack thereof) on GBIF, where there 
is little possibility of fixing it. I interpreted Paul as basically saying 
"yeah, but Wikispecies has mistakes too"! I was pointing out that this criticism 
of Wikispecies is a bit harsh, because the whole thing only works by people with 
knowledge contributing data and fixing errors, so, yes, Wikispecies has mistakes 
too, but that is how Wikispecies works, but it is not how GBIF works, and Paul 
is quite capable I'm sure of fixing any mistakes he spots on Wikispecies, and 
most welcome to do so ...

>I don't know if people were paid to develop wikipedia and wikispecies source 
>code

yes, you do, and obviously they were. Wikimedia is every bit a corporate 
structure as GBIF, I'm sure. But they got/get paid to create/maintain an 
infrastructure that is a blank canvas for anyone to make with what they will, so 
they are not responsible for data quality. On the other hand, although there no 
doubt are disclaimers in the fine print, the clear impression from GBIF is one 
of "come here for your serious validated data needs", and yet the irony is that 
Wikispecies often has far better data quality, and when it doesn't, the user can 
easily see that from the lack of cited primary references. So, there seems to be 
a real problem in life, not just involving GBIF, where on the one hand you have 
"serious validated data rubber stamped by official experts", which tends to be 
error ridden, while the good data languishes undervalued and underutilised in 
places deemed to be "unreliable" , "unofficial", and "not really serious", like 
Wikispecies! This *really* pi$$es me off, in case you hadn't noticed!

>No one is paid to contribute data to GBIF.   You might be referring to the more 
>than 1.5 million US dollars that GBIF provided to taxonomists between 2003-2006 
>to develop taxonomic catalogues.   Larger amounts that went into specimen 
>digitisation.  For the taxonomic data,  the majority hasn't ever been made 
>available to GBIF because we lacked an infrastructural capacity to receive it. 
>Developing this capacity is a component of my (paid) work

there are some fine lines here. If GBIF gives taxonomist X $$$ to develop a 
catalogue that GBIF wants to then use, then taxonomist X got paid to contribute 
data to GBIF, surely? 

>Larger amounts that went into specimen digitisation
this is another can of worms - digitising raw specimen data ... how worthwhile 
really is it? Not very, I suggest, but it is "easy work", and good for 
institutions "recouping overheads" from the pot ...

Stephen

________________________________
From: David Remsen (GBIF) <dremsen at gbif.org>
To: Stephen Thorpe <stephen_thorpe at yahoo.co.nz>
Cc: David Remsen (GBIF) <dremsen at gbif.org>; dipteryx at freeler.nl; 
taxacom at mailman.nhm.ku.edu
Sent: Tue, 30 November, 2010 10:00:00 AM
Subject: Re: [Taxacom] saturday morning fun

On Nov 29, 2010, at 8:57 PM, Stephen Thorpe wrote:

Paul,
>
>
>This in contrast to the Wikipedia entry, which requires very little work on the 

>>
part of the reader for him to be completely misinformed. Wikispecies is 
>
preferable, although it offers only little information, with a 25% rate of error 

>
(as compared to the source it was copied from), but at least it indicates its 
>
source, and it has selected a relevant source
>
and the biggest difference of all between the wikis and GBIF is that you, who 
knows better in this particular, presumably plant, example, COULD have improved 
the information when you visited it, but I bet you didn't ...

Stephen,

I agree that it should be easier to make simple changes to the organisational 
structure behind the index but it's just not that simple at scale.   We are 
exploring ways to enable such annotations in fact.   In regard to Paul,  his 
Index Fungorum nomenclator is one of the few and early authority files we have 
had access to. 

it seems highly hypocritical to me to complain about the data quality of 
>something that only works by people being prepared to make a contribution to it, 
>
>if you aren't prepared to make a contribution to it! 

You lost me here.   I thought YOU were the person complaining about data 
quality?  I told you earlier this year that if you were interested in discussing 
how to extract structure taxonomic data from wikispecies I'd be interested.   

Note also, that unlike 
>GBIF, nobody got paid to contribute the data on wiki, so it is a less serious 
>matter if it isn't quite as good as advertised ...
>

No one is paid to contribute data to GBIF.   You might be referring to the more 
than 1.5 million US dollars that GBIF provided to taxonomists between 2003-2006 
to develop taxonomic catalogues.   Larger amounts that went into specimen 
digitisation.  For the taxonomic data,  the majority hasn't ever been made 
available to GBIF because we lacked an infrastructural capacity to receive it. 
 Developing this capacity is a component of my (paid) work.

The only subset of those data that have been subsequently made available to GBIF 
are those that went into the Catalogue of Life and Index Fungorum.  

People are paid in museums and other organisations to digitise and enter data 
into collections databases.

I don't know if people were paid to develop wikipedia and wikispecies source 
code.   

David

>Stephen
>
>
>________________________________
>From: "dipteryx at freeler.nl" <dipteryx at freeler.nl>
>To: taxacom at mailman.nhm.ku.edu
>Sent: Mon, 29 November, 2010 9:46:03 PM
>Subject: Re: [Taxacom] saturday morning fun
>
>Van: taxacom-bounces at mailman.nhm.ku.edu namens Jim Croft
>Verzonden: ma 29-11-2010 1:04
>
>
>To be fair, the only reason GBIF is 'feeding us shit' is 
>>
because 'shit' is what we gave them.
>
***
Not at all sure about that. What has been playing through my 
mind is the idea that a data aggregator is an agency which can 
be characterized by "Data in, garbage out". It is a complete 
mystery to me why GBIF uses something known to be so completely 
worthless as the taxonomy of the Catalogue of Life; nothing good 
can come of that ...

Like some other list-members, I tried a small test, for which I 
selected a genus where it is known to be essential to be explicit 
about the species concept used in order to be able to interpret 
and handle data, in anything like a meaningful manner. 

Using the GBIF data portal, the most noticeable thing is how much 
work it is to use, before getting to any data. There is indeed a 
significant degree of completely irrelevant material linked from
this entry (the wondrous ways of computers!), but this is easily 
identifiable, so not much of an actual problem. There is no apparent
awareness of the species-concept issue, with more than one species 
concept used happily side by side. So, a lot of work (and 'expert' 
knowledge required), but basically usable. This in contrast to the
Wikipedia entry, which requires very little work on the part of the
reader for him to be completely misinformed. Wikispecies is preferable,
although it offers only little information, with a 25% rate of error
(as compared to the source it was copied from), but at least it
indicates its source, and it has selected a relevant source.

On the whole it proves that the casual user is best advised to just
use Google (which not only did turn up the relevant information but
quickly showed me a very nice site unknown to me): this is less work
and yields more useful results (a higher ratio of information/amount
-of-work) than trying one of the self-advertised high-profile sites
(obviously, the 'expert' does not need advice).

Paul van Rijckevorsel
_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these 
methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  

your search terms here

_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these 
methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom 
 your search terms here