[Taxacom] PS: saturday morning fun

Tue Nov 30 14:35:58 CST 2010

David,

I am utterly astonished that you currently lack the technical capacity to have a 
link on each of your taxon pages to 
http://species.wikimedia.org/wiki/NAME_OF_TAXON

it is even odder that BHL also seem to lack such a capacity

by contrast, my *friends* ZooKeys seem to have managed to do it just fine on 
their taxon profiles facility ...

Stephen

________________________________
From: David Remsen (GBIF) <dremsen at gbif.org>
To: Stephen Thorpe <stephen_thorpe at yahoo.co.nz>
Cc: David Remsen (GBIF) <dremsen at gbif.org>; taxacom at mailman.nhm.ku.edu
Sent: Tue, 30 November, 2010 9:13:46 PM
Subject: Re: PS: [Taxacom] saturday morning fun

Actually I think linking to wikispecies and other select databases (such as 
species pages currently offered by GBIF national Nodes) is a great idea.   In 
fact,  it would be done if we had the technical capacity to actively refactor 
the data portal as it's on a feature request list.    I hope we will be able to 
add this as you suggest.

As to the purity of my motives,  I really don't know what that is all about.

On Nov 29, 2010, at 10:58 PM, Stephen Thorpe wrote:

PS: you see the bit where I become suspicious and start doubting your apparently 
pure motives is this: it would be very technically easy and fully consistent 
with your stated aims and motivations to simply put on each of your taxon pages 
a link to the corresponding Wikispecies page (for example, for 
Mimus,http://species.wikimedia.org/wiki/Mimus, and note that all Wikispecies 
pages have this simple URL 
structurehttp://species.wikimedia.org/wiki/NAME_OF_TAXON), perhaps saying 
something like "here you may find useful data on this taxon, though being open 
edit, we cannot vouch for its accuracy". But this would require (1) genuinely 
pure motives on your part; and (2) a grasp of the difference between theory and 
reality (i.e., in theory Wikispecies is unreliable and a pointless waste of your 
time, but in *reality* ...)
>
>
>
>
________________________________
From: David Remsen (GBIF) <dremsen at gbif.org>
>To: Stephen Thorpe <stephen_thorpe at yahoo.co.nz>
>Cc: David Remsen (GBIF) <dremsen at gbif.org>; taxacom at mailman.nhm.ku.edu
>Sent: Tue, 30 November, 2010 10:30:05 AM
>Subject: Re: [Taxacom] saturday morning fun
>
>Stephen, 
>
>
>Thanks for the summary.  I'd be interested to hear what various Catalogue of 
>Life providers think of all this.  I know some taxonomic sectors,  like the 
>Lepidoptera,  derived from LepIndex NHM-London,  have not been thoroughly 
>reviewed, falling into your 'raw' category.
>
>
>You hit the nail on the head when you say it provides you with a starting point. 
>  We use it as a starting point too.   We could forego this and simply leave the 
>raw data as it is but it seemed an improvement to go with it.  We are trying to 
>expand the capacity to access other, perhaps more comprehensive or refined 
>sources,  should they be offered or available.   At the moment, that starting 
>place is the one of the few places we can go.  Of course, flaking together 
>disparate sets of even high quality data introduces additional complications but 
>I'd be happy to take them on.
>
>
>I'm sure we (at least I) have not fully grasped all the ramifications of this. 
> Ive tried to relay some of the complexities and a rationale behind what we are 
>faced with and do.   I failed to mention the constraints we are under to improve 
>the issues raised this weekend.  Until very recently we have had 2.5 programmers 
>working on the entirety of our infrastructure with nearly no resources for the 
>portal to fix these problems.   This will change in 2011.
>
>
>Best,
>David
>
>
>On Nov 29, 2010, at 9:47 PM, Stephen Thorpe wrote:
>
>You mention some key issues here. Let me focus on just one of them for the 
>moment, namely COL and its suitability as a data provider for GBIF. I suspect 
>that GBIF have basically just thought something like "well, COL is an 
>aggregation of trusted specialist databases in a form that GBIF can use" - but 
>the reality is *way* more complex. For me, when starting to compile a 
>Wikispecies page, I will often use COL as a *starting point only*, actually 
>little more than a convenient way of getting big lists of taxa formatted and put 
>on Wikispecies pages for further scrutiny. Sometimes, the COL data is so 
>obviously worse than useless, that I don't use it at all, not *even* as a 
>starting point. The data providers from COL vary widely in nature. Some of them 
>are near complete for their group, others are highly fragmentary. Some are 
>*very* raw, others are quite well polished. Sometimes, there are problems in the 
>way that COL interprets the data from sources, so all sorts of synonyms get 
>interpreted as valid, etc. Another issue, which I don't fully understand yet, 
>and I could perhaps be mistaken (???), is that even in COL 2010, much of the 
>data seems to have been harvested in 2008 ... I would have thought that COL 2010 
>would have harvested its data in 2010. If not, then COL is running a couple of 
>years behind its own data providers, who will typically not be completely 
>up-to-date either. So, in summary, I would say that COL is nothing more than a 
>convenient *starting point* for building solid biodiversity data, and it 
>requires a fair amount of careful and informed interpretation, not to mention a 
>great deal of manual work to improve on it. I'm not sure that GBIF has fully 
>grasped this? For example, in COL, the family Scarabaeidae is actually what 
>would almost universally be called the subfamily Scarabaeinae of the family 
>Scarabaeidae, and this is not at all obvious. So, COL is actually quite good if 
>you want data on Scarabaeinae, but completely lacking in any data whatsoever on 
>the *huge* scarabaeid subfamilies Melolothinae and Rutelinae.
>>Cheers,
>>Stephen
>
>