[Taxacom] PS: saturday morning fun

Tue Nov 30 02:13:46 CST 2010

Actually I think linking to wikispecies and other select databases  
(such as species pages currently offered by GBIF national Nodes) is a  
great idea.   In fact,  it would be done if we had the technical  
capacity to actively refactor the data portal as it's on a feature  
request list.    I hope we will be able to add this as you suggest.

As to the purity of my motives,  I really don't know what that is all  
about.

On Nov 29, 2010, at 10:58 PM, Stephen Thorpe wrote:

> PS: you see the bit where I become suspicious and start doubting  
> your apparently pure motives is this: it would be very technically  
> easy and fully consistent with your stated aims and motivations to  
> simply put on each of your taxon pages a link to the corresponding  
> Wikispecies page (for example, for Mimus,http://species.wikimedia.org/wiki/Mimus 
> , and note that all Wikispecies pages have this simple URL structurehttp://species.wikimedia.org/wiki/NAME_OF_TAXON) 
> , perhaps saying something like "here you may find useful data on  
> this taxon, though being open edit, we cannot vouch for its  
> accuracy". But this would require (1) genuinely pure motives on your  
> part; and (2) a grasp of the difference between theory and reality  
> (i.e., in theory Wikispecies is unreliable and a pointless waste of  
> your time, but in *reality* ...)
>
> From: David Remsen (GBIF) <dremsen at gbif.org>
> To: Stephen Thorpe <stephen_thorpe at yahoo.co.nz>
> Cc: David Remsen (GBIF) <dremsen at gbif.org>; taxacom at mailman.nhm.ku.edu
> Sent: Tue, 30 November, 2010 10:30:05 AM
> Subject: Re: [Taxacom] saturday morning fun
>
> Stephen,
>
> Thanks for the summary.  I'd be interested to hear what various  
> Catalogue of Life providers think of all this.  I know some  
> taxonomic sectors,  like the Lepidoptera,  derived from LepIndex NHM- 
> London,  have not been thoroughly reviewed, falling into your 'raw'  
> category.
>
> You hit the nail on the head when you say it provides you with a  
> starting point.   We use it as a starting point too.   We could  
> forego this and simply leave the raw data as it is but it seemed an  
> improvement to go with it.  We are trying to expand the capacity to  
> access other, perhaps more comprehensive or refined sources,  should  
> they be offered or available.   At the moment, that starting place  
> is the one of the few places we can go.  Of course, flaking together  
> disparate sets of even high quality data introduces additional  
> complications but I'd be happy to take them on.
>
> I'm sure we (at least I) have not fully grasped all the  
> ramifications of this.  Ive tried to relay some of the complexities  
> and a rationale behind what we are faced with and do.   I failed to  
> mention the constraints we are under to improve the issues raised  
> this weekend.  Until very recently we have had 2.5 programmers  
> working on the entirety of our infrastructure with nearly no  
> resources for the portal to fix these problems.   This will change  
> in 2011.
>
> Best,
> David
>
> On Nov 29, 2010, at 9:47 PM, Stephen Thorpe wrote:
>
>> You mention some key issues here. Let me focus on just one of them  
>> for the moment, namely COL and its suitability as a data provider  
>> for GBIF. I suspect that GBIF have basically just thought something  
>> like "well, COL is an aggregation of trusted specialist databases  
>> in a form that GBIF can use" - but the reality is *way* more  
>> complex. For me, when starting to compile a Wikispecies page, I  
>> will often use COL as a *starting point only*, actually little more  
>> than a convenient way of getting big lists of taxa formatted and  
>> put on Wikispecies pages for further scrutiny. Sometimes, the COL  
>> data is so obviously worse than useless, that I don't use it at  
>> all, not *even* as a starting point. The data providers from COL  
>> vary widely in nature. Some of them are near complete for their  
>> group, others are highly fragmentary. Some are *very* raw, others  
>> are quite well polished. Sometimes, there are problems in the way  
>> that COL interprets the data from sources, so all sorts of synonyms  
>> get interpreted as valid, etc. Another issue, which I don't fully  
>> understand yet, and I could perhaps be mistaken (???), is that even  
>> in COL 2010, much of the data seems to have been harvested in  
>> 2008 ... I would have thought that COL 2010 would have harvested  
>> its data in 2010. If not, then COL is running a couple of years  
>> behind its own data providers, who will typically not be completely  
>> up-to-date either. So, in summary, I would say that COL is nothing  
>> more than a convenient *starting point* for building solid  
>> biodiversity data, and it requires a fair amount of careful and  
>> informed interpretation, not to mention a great deal of manual work  
>> to improve on it. I'm not sure that GBIF has fully grasped this?  
>> For example, in COL, the family Scarabaeidae is actually what would  
>> almost universally be called the subfamily Scarabaeinae of the  
>> family Scarabaeidae, and this is not at all obvious. So, COL is  
>> actually quite good if you want data on Scarabaeinae, but  
>> completely lacking in any data whatsoever on the *huge* scarabaeid  
>> subfamilies Melolothinae and Rutelinae.
>> Cheers,
>> Stephen
>
>
>