[Taxacom] PS: saturday morning fun
David Remsen (GBIF)
dremsen at gbif.org
Tue Nov 30 02:13:46 CST 2010
Actually I think linking to wikispecies and other select databases
(such as species pages currently offered by GBIF national Nodes) is a
great idea. In fact, it would be done if we had the technical
capacity to actively refactor the data portal as it's on a feature
request list. I hope we will be able to add this as you suggest.
As to the purity of my motives, I really don't know what that is all
about.
On Nov 29, 2010, at 10:58 PM, Stephen Thorpe wrote:
> PS: you see the bit where I become suspicious and start doubting
> your apparently pure motives is this: it would be very technically
> easy and fully consistent with your stated aims and motivations to
> simply put on each of your taxon pages a link to the corresponding
> Wikispecies page (for example, for Mimus,http://species.wikimedia.org/wiki/Mimus
> , and note that all Wikispecies pages have this simple URL structurehttp://species.wikimedia.org/wiki/NAME_OF_TAXON)
> , perhaps saying something like "here you may find useful data on
> this taxon, though being open edit, we cannot vouch for its
> accuracy". But this would require (1) genuinely pure motives on your
> part; and (2) a grasp of the difference between theory and reality
> (i.e., in theory Wikispecies is unreliable and a pointless waste of
> your time, but in *reality* ...)
>
> From: David Remsen (GBIF) <dremsen at gbif.org>
> To: Stephen Thorpe <stephen_thorpe at yahoo.co.nz>
> Cc: David Remsen (GBIF) <dremsen at gbif.org>; taxacom at mailman.nhm.ku.edu
> Sent: Tue, 30 November, 2010 10:30:05 AM
> Subject: Re: [Taxacom] saturday morning fun
>
> Stephen,
>
> Thanks for the summary. I'd be interested to hear what various
> Catalogue of Life providers think of all this. I know some
> taxonomic sectors, like the Lepidoptera, derived from LepIndex NHM-
> London, have not been thoroughly reviewed, falling into your 'raw'
> category.
>
> You hit the nail on the head when you say it provides you with a
> starting point. We use it as a starting point too. We could
> forego this and simply leave the raw data as it is but it seemed an
> improvement to go with it. We are trying to expand the capacity to
> access other, perhaps more comprehensive or refined sources, should
> they be offered or available. At the moment, that starting place
> is the one of the few places we can go. Of course, flaking together
> disparate sets of even high quality data introduces additional
> complications but I'd be happy to take them on.
>
> I'm sure we (at least I) have not fully grasped all the
> ramifications of this. Ive tried to relay some of the complexities
> and a rationale behind what we are faced with and do. I failed to
> mention the constraints we are under to improve the issues raised
> this weekend. Until very recently we have had 2.5 programmers
> working on the entirety of our infrastructure with nearly no
> resources for the portal to fix these problems. This will change
> in 2011.
>
> Best,
> David
>
> On Nov 29, 2010, at 9:47 PM, Stephen Thorpe wrote:
>
>> You mention some key issues here. Let me focus on just one of them
>> for the moment, namely COL and its suitability as a data provider
>> for GBIF. I suspect that GBIF have basically just thought something
>> like "well, COL is an aggregation of trusted specialist databases
>> in a form that GBIF can use" - but the reality is *way* more
>> complex. For me, when starting to compile a Wikispecies page, I
>> will often use COL as a *starting point only*, actually little more
>> than a convenient way of getting big lists of taxa formatted and
>> put on Wikispecies pages for further scrutiny. Sometimes, the COL
>> data is so obviously worse than useless, that I don't use it at
>> all, not *even* as a starting point. The data providers from COL
>> vary widely in nature. Some of them are near complete for their
>> group, others are highly fragmentary. Some are *very* raw, others
>> are quite well polished. Sometimes, there are problems in the way
>> that COL interprets the data from sources, so all sorts of synonyms
>> get interpreted as valid, etc. Another issue, which I don't fully
>> understand yet, and I could perhaps be mistaken (???), is that even
>> in COL 2010, much of the data seems to have been harvested in
>> 2008 ... I would have thought that COL 2010 would have harvested
>> its data in 2010. If not, then COL is running a couple of years
>> behind its own data providers, who will typically not be completely
>> up-to-date either. So, in summary, I would say that COL is nothing
>> more than a convenient *starting point* for building solid
>> biodiversity data, and it requires a fair amount of careful and
>> informed interpretation, not to mention a great deal of manual work
>> to improve on it. I'm not sure that GBIF has fully grasped this?
>> For example, in COL, the family Scarabaeidae is actually what would
>> almost universally be called the subfamily Scarabaeinae of the
>> family Scarabaeidae, and this is not at all obvious. So, COL is
>> actually quite good if you want data on Scarabaeinae, but
>> completely lacking in any data whatsoever on the *huge* scarabaeid
>> subfamilies Melolothinae and Rutelinae.
>> Cheers,
>> Stephen
>
>
>
More information about the Taxacom
mailing list