[Taxacom] Life and Literature Code Challenge

Roderic Page r.page at bio.gla.ac.uk
Wed Aug 31 18:34:57 CDT 2011


There are several issues with the integration between EOL and BHL (and other sites).

As Chris notes, the BHL results on EOL pages are poorly displayed. I commented on this back in 2009 http://iphylo.blogspot.com/2009/09/visualising-biodiversity-heritage.html, and indeed it was one of the motivations for building BioStor. A big problem is the quality of the available metadata about the dates for BHL items. This makes it hard to group results by date, for example, which would help.

As for using nomenclators, this  has it's own set of problems. Again, the quality of the BHL metadata can make linking tricky. And nomenclators tend to have somewhat obscure ways of citing the literature (e.g., microcitations). I discuss some of these issues in a post that describes linking Nomenclature Zoologicus to BHL, see http://iphylo.blogspot.com/2011/03/microcitations-linking-nomenclators-to.html

There's also the issue that zoologists don't bother to record new combinations in the same way that botanists do, so tracking down when a name combination was first created can be tricky.

Also, not all nomenclators make their data "freely available". In fact, do any? By free I mean you can download the complete database and play with it.

Automation per se isn't the problem, it's a combination of OCR issues and bibliographic metadata weakness in BHL, and a lack of, or the cryptic nature of citations in nomenclators. None of these problems are insurmountable, they just present *cough* interesting informatics challenges. 

Regards

Rod


On 31 Aug 2011, at 22:29, Paul Kirk wrote:

> Chris,
> 
> The problem is BHL can not/does not make use of data in the many nomenclators freely available ... which could provide the exact page on with the nomenclaturally significant event took place ... :-)
> 
> Paul
> ________________________________________
> From: taxacom-bounces at mailman.nhm.ku.edu [taxacom-bounces at mailman.nhm.ku.edu] on behalf of Chris Thompson [xelaalex at cox.net]
> Sent: 31 August 2011 21:28
> To: John Mignault; taxacom at mailman.nhm.ku.edu
> Subject: Re: [Taxacom] Life and Literature Code Challenge
> 
> John:
> 
> I appreciate the CHALLENGE, but I should remind ALL that "supposedly" the
> BHL literature is automatically linked to species pages of the Encyclopedia
> of Life (EoL). That is, as BHL literature is digitized, the contents are
> scanned by uBio for scientific names. And then a link is made to the
> appropriate species page.
> 
> So, the real challenge is getting the programmers of EoL to find a way so as
> to properly prioritize the order in which references to BHL literature is
> listed.
> 
> And that may be in part something that taxonomists must do manually. That
> is, make a taxonomic judgment about what are the most important references
> beyond the obvious first (original description) and see that the links
> appear in the proper order from most important to insignificant.
> 
> For you all who do not know our Encyclopedia of Life, go to www.eol.org and
> look, for example, at the species page for Musca domestica Linnaeus, the
> common house fly and click on the BHL link [beware, the EoL will be changing
> soon]
> 
> http://www.eol.org/pages/730039
> 
> You will see more than a hundred or so links, but none to the original
> description (Linnaeus 1758) simply because Linnaeus NEVER made the
> combination Musca domestica in the TEXT. The genus name is in the running
> header and the epithet is left justified in the margin! So, the combination
> is not picked up in the automatic scanning by the uBio people, etc.
> 
> But also just look at the mass of links. Sam six-pack who might was to learn
> what was buzzing about his bud would be totally confused!
> 
> Sincerely,
> 
> Chris Thompson
> from home
> 
> -----Original Message-----
> From: John Mignault
> Sent: Wednesday, August 31, 2011 3:40 PM
> To: taxacom at mailman.nhm.ku.edu
> Subject: [Taxacom] Life and Literature Code Challenge
> 
> The Biodiversity Heritage Library is sponsoring a Code Challenge as
> part of the Life and Literature conference being held in Chicago
> November 14-15.
> 
> The Biodiversity Heritage Library (BHL) is a consortium of 12 natural
> history and botanical libraries that cooperate to digitize and make
> accessible the legacy literature of biodiversity held in their
> collections and to make that literature available for open access and
> responsible use as a part of a global “biodiversity commons.” BHL also
> serves as the foundational literature component of the Encyclopedia of
> Life (EOL). BHL content may be freely viewed through the online reader
> or downloaded in part or as a complete work in PDF, OCR text, or
> JPG2000 file formats.
> 
> Your challenge is to provide
> 
>    a new, innovative way to use, disseminate or display BHL data
>    a description of what your project is trying to accomplish
>    the source code to reproduce the application
>    any libraries or supporting code needed to reproduce the application
>    any build instructions or scripts are needed to build application
> or instructions how to run it
>    any notes about your experience implementing this code: how you
> came up with your design, blind alleys you went up, or surprising
> problems you ran into or anything else you want to share.
> 
> 
> The dataset
> Through local and global digitization efforts, BHL has digitized over
> 32 million pages of taxonomic literature, representing over 45,000
> titles and 87,000 volumes (January 2011). The entire -corpus- dataset
> is freely available and accessible via many open methods.
> 
> 
> Timeline
> 
> Deadline for entries is October 17, 2011. The winner will be announced
> on November 1, 2011.
> 
> More details are available on our website at
> http://www.lifeandliterature.org/p/code-challenge.html
> 
> Thanks, and enter!
> 
> --j
> 
> --
> John Mignault
> Systems Librarian
> The LuEsther T Mertz Library
> The New York Botanical Garden
> 
> _______________________________________________
> 
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> 
> The Taxacom archive going back to 1992 may be searched with either of these
> methods:
> 
> (1) by visiting http://taxacom.markmail.org
> 
> (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom
> your search terms here
> 
> 
> _______________________________________________
> 
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> 
> The Taxacom archive going back to 1992 may be searched with either of these methods:
> 
> (1) by visiting http://taxacom.markmail.org
> 
> (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
> ************************************************************************
> The information contained in this e-mail and any files transmitted with it is confidential and is for the exclusive use of the intended recipient. If you are not the intended recipient please note that any distribution, copying or use of this communication or the information in it is prohibited.
> 
> Whilst CAB International trading as CABI takes steps to prevent the transmission of viruses via e-mail, we cannot guarantee that any e-mail or attachment is free from computer viruses and you are strongly advised to undertake your own anti-virus precautions.
> 
> If you have received this communication in error, please notify us by e-mail at cabi at cabi.org or by telephone on +44 (0)1491 829199 and then delete the e-mail and any copies of it.
> 
> CABI is an International Organization recognised by the UK Government under Statutory Instrument 1982 No. 1071.
> 
> **************************************************************************
> 
> 
> 
> 
> _______________________________________________
> 
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> 
> The Taxacom archive going back to 1992 may be searched with either of these methods:
> 
> (1) by visiting http://taxacom.markmail.org
> 
> (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
> 

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html




More information about the Taxacom mailing list