[Taxacom] saturday morning fun

Mon Nov 29 15:00:00 CST 2010

On Nov 29, 2010, at 8:57 PM, Stephen Thorpe wrote:

> Paul,
>
>> This in contrast to the Wikipedia entry, which requires very little  
>> work on the
>> part of the reader for him to be completely misinformed.  
>> Wikispecies is
>> preferable, although it offers only little information, with a 25%  
>> rate of error
>> (as compared to the source it was copied from), but at least it  
>> indicates its
>> source, and it has selected a relevant source
>
> and the biggest difference of all between the wikis and GBIF is that  
> you, who
> knows better in this particular, presumably plant, example, COULD  
> have improved
> the information when you visited it, but I bet you didn't ...

Stephen,

I agree that it should be easier to make simple changes to the  
organisational structure behind the index but it's just not that  
simple at scale.   We are exploring ways to enable such annotations in  
fact.   In regard to Paul,  his Index Fungorum nomenclator is one of  
the few and early authority files we have had access to.

> it seems highly hypocritical to me to complain about the data  
> quality of
> something that only works by people being prepared to make a  
> contribution to it,
> if you aren't prepared to make a contribution to it!

You lost me here.   I thought YOU were the person complaining about  
data quality?  I told you earlier this year that if you were  
interested in discussing how to extract structure taxonomic data from  
wikispecies I'd be interested.

> Note also, that unlike
> GBIF, nobody got paid to contribute the data on wiki, so it is a  
> less serious
> matter if it isn't quite as good as advertised ...

No one is paid to contribute data to GBIF.   You might be referring to  
the more than 1.5 million US dollars that GBIF provided to taxonomists  
between 2003-2006 to develop taxonomic catalogues.   Larger amounts  
that went into specimen digitisation.  For the taxonomic data,  the  
majority hasn't ever been made available to GBIF because we lacked an  
infrastructural capacity to receive it.  Developing this capacity is a  
component of my (paid) work.

The only subset of those data that have been subsequently made  
available to GBIF are those that went into the Catalogue of Life and  
Index Fungorum.

People are paid in museums and other organisations to digitise and  
enter data into collections databases.

I don't know if people were paid to develop wikipedia and wikispecies  
source code.

David

>
> Stephen
>
>
> ________________________________
> From: "dipteryx at freeler.nl" <dipteryx at freeler.nl>
> To: taxacom at mailman.nhm.ku.edu
> Sent: Mon, 29 November, 2010 9:46:03 PM
> Subject: Re: [Taxacom] saturday morning fun
>
> Van: taxacom-bounces at mailman.nhm.ku.edu namens Jim Croft
> Verzonden: ma 29-11-2010 1:04
>
>> To be fair, the only reason GBIF is 'feeding us shit' is
>> because 'shit' is what we gave them.
>
> ***
> Not at all sure about that. What has been playing through my
> mind is the idea that a data aggregator is an agency which can
> be characterized by "Data in, garbage out". It is a complete
> mystery to me why GBIF uses something known to be so completely
> worthless as the taxonomy of the Catalogue of Life; nothing good
> can come of that ...
>
> Like some other list-members, I tried a small test, for which I
> selected a genus where it is known to be essential to be explicit
> about the species concept used in order to be able to interpret
> and handle data, in anything like a meaningful manner.
>
> Using the GBIF data portal, the most noticeable thing is how much
> work it is to use, before getting to any data. There is indeed a
> significant degree of completely irrelevant material linked from
> this entry (the wondrous ways of computers!), but this is easily
> identifiable, so not much of an actual problem. There is no apparent
> awareness of the species-concept issue, with more than one species
> concept used happily side by side. So, a lot of work (and 'expert'
> knowledge required), but basically usable. This in contrast to the
> Wikipedia entry, which requires very little work on the part of the
> reader for him to be completely misinformed. Wikispecies is  
> preferable,
> although it offers only little information, with a 25% rate of error
> (as compared to the source it was copied from), but at least it
> indicates its source, and it has selected a relevant source.
>
> On the whole it proves that the casual user is best advised to just
> use Google (which not only did turn up the relevant information but
> quickly showed me a very nice site unknown to me): this is less work
> and yields more useful results (a higher ratio of information/amount
> -of-work) than trying one of the self-advertised high-profile sites
> (obviously, the 'expert' does not need advice).
>
> Paul van Rijckevorsel
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either  
> of these
> methods:
>
> (1) http://taxacom.markmail.org
>
> Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/ 
> pipermail/taxacom
> your search terms here
>
>
>
>
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either  
> of these methods:
>
> (1) http://taxacom.markmail.org
>
> Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/ 
> pipermail/taxacom  your search terms here
>