[Taxacom] the hurdle for all biodiv informatics initiatives

Dave Roberts workpackage6 at googlemail.com
Mon Feb 22 03:59:03 CST 2010


On 22 Feb 2010, at 08:58, Vladimir Blagoderov wrote:

> Dear Tony,
>
> On Mon, Feb 22, 2010 at 07:09, Stephen Thorpe  
> <s.thorpe at auckland.ac.nz>
> wrote:
>
>> - How do Scratchpads approach the “enter once, use many times”  
>> paradigm –
>> i.e., if information is entered or updated in one Scratchpad, can  
>> this be
>> propagated automatically to others? (E.g. cf. the WoRMS approach –  
>> a sponge
>> expert modifies the World Porifera Database, other DB’s which  
>> utilise the
>> same shared master taxon list e.g. ERMS, RAMS are instantly updated)
>>
>
> Data exchange is the problem of all bioinformatics initiative: it is  
> either
> copy/paste, which multiplies errors across the Web or mush-up.  
> Scratchpads
> development follows accepted standards as much as possible, for  
> example
> specimen and locality records are Darwin-core 1.2.1.-compatible. So,  
> the
> data can be re-used. Perhaps you suggesting global synchronization  
> of all
> taxonomic databases? Unfortunately, we have to be realists.

Part of the 'sweet spot' is just how you allow communities to develop  
their own view of a classification system.  Scratchpads were conceived  
to allow a community to modify and extend their taxonomy and they saw  
fit.  It is quite reasonable that more than one Scratchpad will cover  
a particular taxon (although I know of no example yet).  Bottom line:  
each Scratchpad has complete control over its taxonomy.  No one  
outside that community can propagate a change to it without the  
community's agreement.

>> - Is it possible to do a single query across multiple Scratchpads,  
>> for a
>> taxon or taxon name of interest
>>
>
> Not at the moment, however, most of Scratchpads are taxon-oriented,  
> and it
> is possible to have multiple classifications within one Scratchpad.  
> You also
> can display information from external sources, for example the other
> Scratchpads, on taxon pages and give it proper credit. I suppose that
> community maintaining the Scratchpad would be interested to collect  
> all
> available information in one place.

Of course you can Google.  Most Scratchpads live as sub-domains of  
myspecies.info, which you can use to restrict the search.  I can't  
imagine why you'd want to do that though.

These cross-Scratchpad integration tools are an obvious benefit and  
are an active area of development for us.  They simply haven't got to  
the top of the priority pile for our limited development resource.  Yet.


>> - Are there scalability issues, i.e. will a Scratchpad break if you  
>> try to
>> load e.g. 1 million taxon names, or 10 million into it (as per  
>> current uBio
>> content, etc.)
>>
>
> It was tested, and it does work

It does depend on the purpose of loading the names.  We have loaded  
'life', from CoL, as an hierarchy into the taxonomy module (so that it  
can be navigated).  That is the test to which Vlad refers.  It was  
about 2M names.  You can't pull uBio into it because uBio's names are  
not organised into an hierarchy (they're in multiple hierarchies and  
generally stored with only their immediate parent or parents).  Indeed  
building a taxonomic hierarchy from uBio is a challenge.

>> - Is it possible to obtain a database or tabular dump of up-to-the  
>> minute
>> taxonomic information, preferably from all Scratchpads, for local  
>> processing
>> or upload to other systems?
>
> Drupal based on mySQL. In theory you can export a particular node,
> selection, or entire database. How to provide this functionality is a
> different question, we are working on it.

This depends on what you mean by "up-to-the minute taxonomic  
information".  If you mean taxonomic relationships as represented in  
the hierarchy, no, not yet.  This is on our to-do list.

Otherwise, basically yes, provided you have the permissions to do so.   
We, as Scratchpad developers, do not own or have rights to the data.   
Individual sites make data available as they see fit.

This is another of those 'sweet spot' compromises.  Scratchpad  
communities own and manage their data.  They are not contributing to a  
large data-gathering enterprise, such as EoL or CoL.  They are  
contributing to a vast knowledge base that is better compared to the  
literature.  It is just more mobile.  This is the compromise.  People  
still own their data and that ownership is one of the reasons that  
they engage with Scratchpads to make their data available.

Cheers,  Dave
-- 
Dr D.McL. Roberts,        Tel: +44 (0)20 7942 5086
European Distributed Institute of Taxonomy Project,
Coordinator WorkPackage 6 (Unifying Revisionary Taxonomy),
Dept. Zoology,
The Natural History Museum,
Cromwell Road,
London        SW7 5BD
Great Britain             Email: dmr at nomencurator.org
Web page:  http://scratchpads.eu
Web page:  http://www.editwebrevisions.info/
--
"You can't just ask customers what they want and then try and give it  
to them.  By the time you get it built, they'll want something  
new." [Steve Jobs, quoted in The Guardian, Technology Section, 25 June  
09].
--









More information about the Taxacom mailing list