[Taxacom] progress on globalnames.org - BHL side response

Wed May 13 14:25:30 CDT 2009

It is absolutely KILLING me not to comment more on this thread, but I am
just WAY too far behind in other work.  However, it will drive me to
distraction (from that other work), if I do not at least make these comments
(sorry, Neal*...)

1) I suspect the reasons for Paddy's "decadal" prediction are related to the
fact that all of these ideas were crystal clear to the techno-literate
taxonomic community at the dawn of the internet (even before) -- i.e.,
decades -- yet we are not now where we should be (indeed, should have been
over a decade ago!).  Yes, I realize the playing field keeps changing (e.g.,
the advent of Wikipedia -- although the concept of a Wiki has been around
for a decade or so at least); but these are incremental alterations to the
fundamental change, which was large-scale access to the internet.

2) Although Paddy's "decadal" prediction is (sadly) realistic (if the past
couple of decades are any real indication), I don't think it necessarily has
to be on that scale. At least not for the development of the core
infrastructure (getting all the content players on-board could still require
decades...).  And this is where I will say that the Global Names
Architecture (GNA) offers (by far) the best promise of real progress for
online taxonomic utopia on a universal scale of any other initiative I've
seen yet (I got into this game in about 1990, and there are many here who
have been playing much longer than that).  There are (currently) two major
data caches within GNA:  The global Names Index (GNI), and the Global Name
Usage Bank (GNUB).  Both are intended to be completely open-access, and
ideally will exist as dozens or hundreds of mirrors all over the planet that
are automatically kept in synch.  Both are intended to capitalize on
(vastly) distributed editorship and contributorship (not centralized
authority).  And both are designed to be Code-independent, and serve as the
core for a swarm of open-access, open-source web services that will further
flesh-out the GNA-space.  The GNI came together in functional form faster
than just about any other initiative I've ever see (i.e., on the speed scale
of the original DiGIR/DarwinCore).  GNUB is not far behind.  The next step
will be to develop some of the critical/fundamental web services that
operate on top of (and between) these two data caches.  It would take me all
day to articulate the GNA/GNI/GNUB vision in detail, but I imagine there
will be more information forthcoming (especially following the e-Biosphere
conference).  By all means, watch this space.

3) The fundamental property of GNA that, I think, gives it the greatest
chance for success is that it offers a "non-partisan" space for the
proprietary datasets to become interconnected.  This is not to say that all
proprietary datasets will want to play, nor does it mean that GNA will
replace the roles and services of those proprietary datasets (it most
certainly will not -- if anything, it will greatly enhance the value of
those roles and services to the broader communities). Thus, following up on
Dean's post below, I see no real problem with doing the simultaneous
"vertical" and "horizontal" literature scanning paradigms, even if it means
substantial overlap, PROVIDED that the overlap is instantaneously obvious
and cross-linked.  And the way *that* will come to pass is if all of the
efforts to scan and index literature are working off a common repository for
(GUID-ed) literature citations. More on that another time....

...because for now, I've used up the 15 minutes I allowed myself to write
this. (See -- I *can* exercise at least some restraint...)

Aloha,
Rich

*P.S. For the record, my employer has NEVER attempted, in any way, to
curtail the time I spend contributing to these forums.  My restraint is
entirely self-imposed. I only referenced Neal above as an inside joke to a
comment he made to me yesterday....

P.P.S. Damn!  That P.S. thing just extended the time spent on this message
to 16 minutes! Arghhhh! :-)

Richard L. Pyle, PhD
Database Coordinator for Natural Sciences
  and Associate Zoologist in Ichthyology
Department of Natural Sciences, Bishop Museum
1525 Bernice St., Honolulu, HI 96817
Ph: (808)848-4115, Fax: (808)847-8252
email: deepreef at bishopmuseum.org
http://hbs.bishopmuseum.org/staff/pylerichard.html

> -----Original Message-----
> From: taxacom-bounces at mailman.nhm.ku.edu 
> [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Dean 
> Pentcheff
> Sent: Wednesday, May 13, 2009 7:33 AM
> To: taxacom at mailman.nhm.ku.edu
> Subject: Re: [Taxacom] progress on globalnames.org - BHL side response
> 
> There's another angle from which to approach this. An 
> extremely effective way to capture the relevant taxonomic 
> literature for a taxon is to capture the reprint collection 
> of a specialist taxonomist. If digitizing entire journal runs 
> is seen as a "vertical" approach, this can be seen as the 
> taxonomically "horizontal" approach.
> 
> The advantage is that the taxonomic and physical selection of 
> papers has already been done through the career efforts of 
> taxon specialists.
> In the case of major institutional section holdings, this is 
> the combined labor of generations of workers. The result is 
> very-near-complete coverage for a taxon, including the 
> ridiculously obscure publications that will probably never be 
> captured in full-journal-run scans.
> 
> Of course there's going to be overlap with "vertical" 
> scanning efforts. That's a really minor problem -- the least 
> of our difficulties is having too many digital copies of 
> taxonomic publications. Realistically, if a taxon-specific 
> scanning project is aware that a few major journals for that 
> taxon are being comprehensively scanned, it's easy enough to 
> yank those reprints from the workflow to avoid duplicate labor.
> 
> This is an approach that we're implementing with the 
> literature for the Decapoda, and are planning to expand to 
> the Peracarida (which, together, should cover about 2/3 of 
> the Crustacea).
> 
> I'd love to see how we can best integrate the two approaches: 
> vertical scanning of entire journals & volumes by (e.g.) BHL, 
> and taxon-specific capture by scanning reprint libraries.
> 
> -Dean
> --
> Dean Pentcheff
> pentcheff at gmail.com
> 
> On Wed, May 13, 2009 at 3:48 AM, Donat Agosti <agosti at amnh.org> wrote:
> > Dear Chris
> >
> >
> >
> > Entomology or so is not fine grained enough, as you point out. Best 
> > would be to get either research groups involved at their research 
> > level. For example spiders, fish or ants could be one, or then it 
> > could be based on regions, eg Madagascar, or conservation 
> issues such 
> > as red-listing of mammals, or the pollinators. The projects 
> have to be research driven.
> >
> >
> >
> > I would go beyond nomenclature vs taxonomy but try to 
> > activate/mobilize users of names for particular bodies of 
> literature (see above).
> >
> >
> >
> > I would also really make an effort to cover all, not just 
> the very old 
> > literature that is out of what you currently perceive as copyright. 
> > This "new" literature is the one that interests most of the people 
> > that might be of help, and that would be of use far beyond 
> names. Why 
> > not make for this "new" literature accessible so that only 
> those pages 
> > appear that contain descriptions?
> >
> >
> >
> > I would also maintain another line of support for 
> individuals that can 
> > demonstrate that they work on catalogues of particular 
> taxa, such as 
> > fish, Solanaceae (eg PBIs, etc.) etc. They then could be 
> accepted if 
> > they can supply you with a bibliography.
> >
> >
> >
> > The fishing expeditions for not yet catalogue taxa ought be low 
> > priority and ought come along as "collateral damage" when 
> scanning in 
> > the serials that are part of above selection project.
> >
> >
> >
> > Donat
> >
> >
> >
> >
> >
> >  _____
> >
> > From: Chris Freeland [mailto:Chris.Freeland at mobot.org]
> > Sent: Wednesday, May 13, 2009 1:20 PM
> > To: Donat Agosti; taxacom at mailman.nhm.ku.edu
> > Subject: RE: [Taxacom] progress on globalnames.org - BHL 
> side response
> >
> >
> >
> > Donat, all,
> >
> > Just a point of clarification - BHL hasn't been randomly scanning 
> > content, but rather working with partner libraries to identify 
> > well-curated taxonomic subsets within our collections while also 
> > staying in line with the broader goals and themes set by EOL. For 
> > instance, SI's entomology collection is fully barcoded and 
> > bibliographically complete, so they've focused their 
> efforts there; Harvard MCZ has taken the same approach with 
> herpetology.
> > MBL responded to EOL's initial theme of "marine life" (what 
> *exactly* 
> > is that, taxonomically speaking?) and so scanned large, 
> broad ranges 
> > of their collection to try to cover that wide theme.  
> > MOBOT/NYBG/Harvard Botany have been working down a prioritized list 
> > originally created 5 years ago (a half-decade; sorry, Rod!) and 
> > revised here: http://bit.ly/15ECET, along with other 
> botanical journals and monographs.
> >
> > That said, I am in complete and total agreement that we 
> need a way to 
> > make finer-grained decisions on what to send for scanning now that 
> > we're past our proof of concept stage.  We have a 
> functioning workflow 
> > for digitization and an infrastructure for delivery.  How 
> best to fill 
> > our repository and with what is a subject of constant 
> discussion within our ranks.
> >
> > Our current line of thinking is to amass as many specialist 
> > bibliographies as possible and aggregate citations by 
> journal in order 
> > to prioritize those journals for digitization.  We've been 
> scrambling 
> > to put a system in place to accommodate this, which we plan to demo 
> > and discuss at eBiosphere and announce through this list & 
> our blog.  
> > If others have thoughts on a process that allows us to get 
> the right 
> > titles into our scanning queue then let us know.  We're 
> here to (scan &) serve, not disappoint.
> >
> > Chris
> > BHL, MOBOT
> >
> >
> > -----Original Message-----
> > From: taxacom-bounces at mailman.nhm.ku.edu on behalf of Donat Agosti
> > Sent: Wed 5/13/2009 2:51 AM
> > To: taxacom at mailman.nhm.ku.edu
> > Subject: Re: [Taxacom] progress on globalnames.org
> >
> >
> > I agree with Rod, this can't be accepted to think in such 
> long ranges.
> >
> > I think, there ought be much more strategic thinking in 
> this.  Eg the 
> > Biodivlibrary should not randomly (from a taxonomic point of view) 
> > scan in stuff, but target specific groups. Taxonomic 
> experts should be 
> > able to apply for slots that would cover all their literature. This 
> > does not mean to scan one reprint after the other, but 
> rather serials 
> > that include the largest number of papers. The collaterals, others 
> > papers not covering the target group, would still be an 
> incentive for 
> > others to comprehend, what a tremendous resource this is.
> >
> > For me this sort of decadal or grand thinking seems to be 
> completely 
> > off or decoupled from a research strategy that asks 
> questions and the 
> > finds way to solve them, including the building up of the 
> necessary IT 
> > infrastructure and content.
> > It is rather infused by Google creating in our community 
> and funding 
> > agency the misunderstood desire to create the mother system 
> of all the 
> > biodiversity information.
> >
> > It is similar to planning to fly to Mars, but without the 
> billions of 
> > dollars to spend.
> >
> > So, what we need is strategic thinking coupled with tools 
> that allow 
> > editing and linking data in a very efficient way that will 
> essentially 
> > lead to data that can be used new insights and knowledge. Only this 
> > will lead to a community that is willing to chip in their 
> efforts and 
> > shorten the time substantially.
> >
> > Donat
> >
> >
> > -----Original Message-----
> > From: taxacom-bounces at mailman.nhm.ku.edu
> > [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of 
> Roderic Page
> > Sent: Wednesday, May 13, 2009 11:34 AM
> > To: David Patterson
> > Cc: taxacom at mailman.nhm.ku.edu
> > Subject: Re: [Taxacom] progress on globalnames.org
> >
> > Am I the only horrified by this timescale?
> >
> > On 12 May 2009, at 16:45, David Patterson wrote:
> >
> >>
> >> Expectation management:  How long before this all 
> operational? Best 
> >> to think decadally.
> >>
> >
> > Why can't we have this sooner? Like, *cough*, now? Is it crazy to
> > suggest that if all these names were dumped in a wiki, together with
> > annotations (e.g., links to literature), any our community set about
> > adding/annotating/cleaning, we could have this done rather 
> sooner...?
> >
> > Rod
> >
> >
> >
> > ---------------------------------------------------------
> > Roderic Page
> > Professor of Taxonomy
> > DEEB, FBLS
> > Graham Kerr Building
> > University of Glasgow
> > Glasgow G12 8QQ, UK
> >
> > Email: r.page at bio.gla.ac.uk
> > Tel: +44 141 330 4778
> > Fax: +44 141 330 2792
> > AIM: rodpage1962 at aim.com
> > Facebook: http://www.facebook.com/profile.php?id=1112517192
> > Twitter: http://twitter.com/rdmpage
> > Blog: http://iphylo.blogspot.com
> > Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> >
> > Taxacom Mailing List
> > Taxacom at mailman.nhm.ku.edu
> > http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> >
> > The Taxacom archive going back to 1992 may be searched with 
> either of these
> > methods:
> >
> > (1) http://taxacom.markmail.org
> >
> > Or (2) a Google search specified as:
> > site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
> >
> >
> >
> > _______________________________________________
> >
> > Taxacom Mailing List
> > Taxacom at mailman.nhm.ku.edu
> > http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> >
> > The Taxacom archive going back to 1992 may be searched with 
> either of these
> > methods:
> >
> > (1) http://taxacom.markmail.org
> >
> > Or (2) a Google search specified as:
> > site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
> >
> >
> >
> >
> > _______________________________________________
> >
> > Taxacom Mailing List
> > Taxacom at mailman.nhm.ku.edu
> > http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> >
> > The Taxacom archive going back to 1992 may be searched with 
> either of these methods:
> >
> > (1) http://taxacom.markmail.org
> >
> > Or (2) a Google search specified as:  
site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms > here
> >
> 
> _______________________________________________
> 
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> 
> The Taxacom archive going back to 1992 may be searched with 
> either of these methods:
> 
> (1) http://taxacom.markmail.org
> 
> Or (2) a Google search specified as:  
> site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here