[Taxacom] Explicit vs implied linking

Wed Dec 1 01:42:50 CST 2010

Stephen,

There is a very practical answer to your question and I've ground- 
truthed it to some degree in my former work with uBio.   The BioOne  
journals used the same methodology to link to ITIS,  by appending a  
name to the end of a URL and casting it out as a link in case  
something stuck on the ITIS end.   What they found,  and what I used  
to ask people is - If you click on the dead link once - OK things  
happen.   You click the second time and you get nothing.   You gonna  
click a third time?

Generating an explicit index of links is trivial for structured data  
management systems.    It is easier to do it the way you suggest but  
my experience is that stale links don't get used.

DR

On Nov 30, 2010, at 11:22 PM, Stephen Thorpe wrote:

> thanks David
>
> just one comment:
>
> >The method you suggest,  however, is not the best way to do this<
>
> actually I rather think it *is* the best way
>
> GBIF has rather a tendency to overcomplicate matters, but if one has  
> good honest data, then simplicity is best
>
> why would you worry if the page actually existed or not? If it  
> doesn't exist, Wikispecies will tell you that it doesn't, and you  
> can simply press the back browser button to get back to GBIF. I  
> don't see this as a problem. On the other hand, if you only add  
> links to pages that exist, then how do you keep up with new pages as  
> they come into existence at any time? If the link is already there,  
> then it will work *as soon as* the page comes into existence,  
> without any work needed by GBIF, which is surely preferable?
>
> IMHO, GBIF should concentrate on what it can provide that nobody  
> else can (e.g., maps of distributions from unpublished museum  
> specimens, etc.), and provide links to other information that  
> Wikispecies, for one example, might be better at providing ...
>
> Stephen
>
> From: David Remsen (GBIF) <dremsen at gbif.org>
> To: Stephen Thorpe <stephen_thorpe at yahoo.co.nz>
> Cc: David Remsen (GBIF) <dremsen at gbif.org>; taxacom at mailman.nhm.ku.edu
> Sent: Wed, 1 December, 2010 11:03:53 AM
> Subject: Re: PS: [Taxacom] saturday morning fun
>
> Perhaps because it's more complex than you think.   I know exactly  
> how to do it.   I did say we don't have programmer capacity to add  
> that functionality now.   In fact,  we have a linking mechanism in  
> place via the Global Names Index for just this sort of thing.   The  
> method you suggest,  however, is not the best way to do this.    
> Rather than concatenate any taxon name to the template you  
> provided,  it would be better to generate an explicit link of only  
> those pages which actually exist and provide them as an index to  
> globalnames.org
>
> This provides a systematic way to link multiple collections of  
> links,  it uses a simple international data standard as the format  
> for the index, and it works.    That feature is on a feature request  
> list as is, now,  your recommendation.
>
> http://code.google.com/p/gbif-dataportal/issues/detail?id=97
>
> DR
>
>
> On Nov 30, 2010, at 9:35 PM, Stephen Thorpe wrote:
>
>> David,
>>
>> I am utterly astonished that you currently lack the technical  
>> capacity to have a link on each of your taxon pages to http://species.wikimedia.org/wiki/NAME_OF_TAXON
>>
>> it is even odder that BHL also seem to lack such a capacity
>>
>> by contrast, my *friends* ZooKeys seem to have managed to do it  
>> just fine on their taxon profiles facility ...
>> Stephen
>>
>>
>>
>> From: David Remsen (GBIF) <dremsen at gbif.org>
>> To: Stephen Thorpe <stephen_thorpe at yahoo.co.nz>
>> Cc: David Remsen (GBIF) <dremsen at gbif.org>;  
>> taxacom at mailman.nhm.ku.edu
>> Sent: Tue, 30 November, 2010 9:13:46 PM
>> Subject: Re: PS: [Taxacom] saturday morning fun
>>
>>
>> Actually I think linking to wikispecies and other select databases  
>> (such as species pages currently offered by GBIF national Nodes) is  
>> a great idea.   In fact,  it would be done if we had the technical  
>> capacity to actively refactor the data portal as it's on a feature  
>> request list.    I hope we will be able to add this as you suggest.
>>
>> As to the purity of my motives,  I really don't know what that is  
>> all about.
>>
>>
>>
>> On Nov 29, 2010, at 10:58 PM, Stephen Thorpe wrote:
>>
>>> PS: you see the bit where I become suspicious and start doubting  
>>> your apparently pure motives is this: it would be very technically  
>>> easy and fully consistent with your stated aims and motivations to  
>>> simply put on each of your taxon pages a link to the corresponding  
>>> Wikispecies page (for example, for Mimus,http://species.wikimedia.org/wiki/Mimus 
>>> , and note that all Wikispecies pages have this simple URL  
>>> structurehttp://species.wikimedia.org/wiki/NAME_OF_TAXON), perhaps  
>>> saying something like "here you may find useful data on this  
>>> taxon, though being open edit, we cannot vouch for its accuracy".  
>>> But this would require (1) genuinely pure motives on your part;  
>>> and (2) a grasp of the difference between theory and reality  
>>> (i.e., in theory Wikispecies is unreliable and a pointless waste  
>>> of your time, but in *reality* ...)
>>>
>>> From: David Remsen (GBIF) <dremsen at gbif.org>
>>> To: Stephen Thorpe <stephen_thorpe at yahoo.co.nz>
>>> Cc: David Remsen (GBIF) <dremsen at gbif.org>; taxacom at mailman.nhm.ku.edu
>>> Sent: Tue, 30 November, 2010 10:30:05 AM
>>> Subject: Re: [Taxacom] saturday morning fun
>>>
>>> Stephen,
>>>
>>> Thanks for the summary.  I'd be interested to hear what various  
>>> Catalogue of Life providers think of all this.  I know some  
>>> taxonomic sectors,  like the Lepidoptera,  derived from LepIndex  
>>> NHM-London,  have not been thoroughly reviewed, falling into your  
>>> 'raw' category.
>>>
>>> You hit the nail on the head when you say it provides you with a  
>>> starting point.   We use it as a starting point too.   We could  
>>> forego this and simply leave the raw data as it is but it seemed  
>>> an improvement to go with it.  We are trying to expand the  
>>> capacity to access other, perhaps more comprehensive or refined  
>>> sources,  should they be offered or available.   At the moment,  
>>> that starting place is the one of the few places we can go.  Of  
>>> course, flaking together disparate sets of even high quality data  
>>> introduces additional complications but I'd be happy to take them  
>>> on.
>>>
>>> I'm sure we (at least I) have not fully grasped all the  
>>> ramifications of this.  Ive tried to relay some of the  
>>> complexities and a rationale behind what we are faced with and  
>>> do.   I failed to mention the constraints we are under to improve  
>>> the issues raised this weekend.  Until very recently we have had  
>>> 2.5 programmers working on the entirety of our infrastructure with  
>>> nearly no resources for the portal to fix these problems.   This  
>>> will change in 2011.
>>>
>>> Best,
>>> David
>>>
>>> On Nov 29, 2010, at 9:47 PM, Stephen Thorpe wrote:
>>>
>>>> You mention some key issues here. Let me focus on just one of  
>>>> them for the moment, namely COL and its suitability as a data  
>>>> provider for GBIF. I suspect that GBIF have basically just  
>>>> thought something like "well, COL is an aggregation of trusted  
>>>> specialist databases in a form that GBIF can use" - but the  
>>>> reality is *way* more complex. For me, when starting to compile a  
>>>> Wikispecies page, I will often use COL as a *starting point  
>>>> only*, actually little more than a convenient way of getting big  
>>>> lists of taxa formatted and put on Wikispecies pages for further  
>>>> scrutiny. Sometimes, the COL data is so obviously worse than  
>>>> useless, that I don't use it at all, not *even* as a starting  
>>>> point. The data providers from COL vary widely in nature. Some of  
>>>> them are near complete for their group, others are highly  
>>>> fragmentary. Some are *very* raw, others are quite well polished.  
>>>> Sometimes, there are problems in the way that COL interprets the  
>>>> data from sources, so all sorts of synonyms get interpreted as  
>>>> valid, etc. Another issue, which I don't fully understand yet,  
>>>> and I could perhaps be mistaken (???), is that even in COL 2010,  
>>>> much of the data seems to have been harvested in 2008 ... I would  
>>>> have thought that COL 2010 would have harvested its data in 2010.  
>>>> If not, then COL is running a couple of years behind its own data  
>>>> providers, who will typically not be completely up-to-date  
>>>> either. So, in summary, I would say that COL is nothing more than  
>>>> a convenient *starting point* for building solid biodiversity  
>>>> data, and it requires a fair amount of careful and informed  
>>>> interpretation, not to mention a great deal of manual work to  
>>>> improve on it. I'm not sure that GBIF has fully grasped this? For  
>>>> example, in COL, the family Scarabaeidae is actually what would  
>>>> almost universally be called the subfamily Scarabaeinae of the  
>>>> family Scarabaeidae, and this is not at all obvious. So, COL is  
>>>> actually quite good if you want data on Scarabaeinae, but  
>>>> completely lacking in any data whatsoever on the *huge*  
>>>> scarabaeid subfamilies Melolothinae and Rutelinae.
>>>> Cheers,
>>>> Stephen
>>>
>>>
>>>
>>
>>
>>
>
>
>