[Taxacom] FW: formation of zoological names with Mc, Mac, et

Sat Sep 5 23:10:25 CDT 2009

Jim:  The term "nameString" has evolved a bit over time.  In the context of
LinneanCore (which was wrapped into TCS), it was intended to specifically
exclude authorship stuff 
(see:  http://wiki.tdwg.org/twiki/bin/view/UBIF/LinneanCoreDefinitions)

More recently, in the context of discussions about GNA/GNI, it has come to
include the full string including name bits and authorship bits.

In either case, the reason for qualifying the word "name" with "string" was
to emphasize that it's a taxon name represented by nothing more than a text
string (in contrast to a taxon name represented as a data object unto
itself, with rich metadata).  There are many, many, many data records that
represent taxon names as nothing more than a string of text characters.  The
prime example would be uBio/NameBank; but also text strings harvested from
literature scanning/OCRing efforts such as BHL.  In many ways, the DwC names
data would fall into this category.

To answer your question, I would say they exist in the sense that we need to
deal with them because of your option "a" (that is, many datasets use them
because they only have a single unparsed field for taxon name, and/or even a
parsed taxon name with no other qualifying nomenclatural metadata other than
the name[+authorship] itself.  They also exist as your "b" inside the GNI,
as a way of building bridges between datasets with fully parsed name
"objects" (e.g., from nomenclators) to those namestring-only datasets (via
"wizardry", aka clever parsers and text-matching services).  So I guess the
answer is "c".

As for which elements should be concatenated, I think that's still an open
question.  Certainly Uninomial/binomial/trinomial/etc. + author (botanical
style or zoological style) + year.  Not sure about page numbers and "sec"
authorship (I tossed all my "sec"-formatted strings into GNI, just to see
what would happen).

There's some good information on the GNI website (www.globalnames.org --
check the links to "Help" and "API" in the upper right), and there is more
information coming soon about GNA/GNI/GNUB.

Aloha,
Rich

> -----Original Message-----
> From: taxacom-bounces at mailman.nhm.ku.edu 
> [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Jim Croft
> Sent: Saturday, September 05, 2009 7:19 AM
> To: David Remsen (GBIF)
> Cc: taxacom at mailman.nhm.ku.edu
> Subject: Re: [Taxacom] FW: formation of zoological names with 
> Mc, Mac, et
> 
> A quick point of clarification...  These days many (most?) 
> databases enable/require atomized entry of name data and name 
> metadata.
> 
> Is the design intent of the 'namestring' thingy to: a) act as 
> a placeholder for an unparsed string until someone can get 
> around to doing something wioth it; b) act as a repository 
> for a concatenation of atomized elements in anticipation of 
> some mysterious operational wizardry; or c) both?
> 
> and, if b), what elements could or should be concatenated?
> 
> just wondering is all....
> 
> jim
> 
> On Thu, Sep 3, 2009 at 7:07 PM, David Remsen 
> (GBIF)<dremsen at gbif.org> wrote:
> >
> > Just a couple of points regarding this thread.
> >
> > We use the term 'namestring' when referring to the literal 
> orthography 
> > of a name as it has been used in a particular instance because the 
> > same name generally has many distinct orthographies.   A 
> name may or 
> > may not include authorship.  Authorship may or may not include a 
> > publication year.  Authors are abbreviated, etc.    Processing 
> > namestrings is necessary in this digital age and the 
> reality of this 
> > wide latitude in orthography presents special difficulties in 
> > effectively grouping the right sets of namestrings together and 
> > excluding the wrong ones.   This is less an issue when working with 
> > single datasets but becomes significant when integrating data from 
> > many sources.  It's amazing how many different ways a name can be 
> > written and all are essentially correct, just more or less compete.
> >
> > Regarding name atomisation, we are quite close to being able to 
> > effectively atomise 99.99% of all namestrings into distinct and 
> > identified components so that the use of a parsed or unparsed 
> > scientific name in data management practices will be less an issue 
> > than it might be now.  The development of name parsing tools and 
> > services is quite active at the moment and is currently testing 
> > against rather esoteric orthographies. In fact,  we are 
> interested in 
> > finding cases that either break the parsers or are simply too 
> > ambiguous to effectively parse.
> >
> > A reference implementation can be tried at 
> > http://globalnames.org/parsers/new
> >
> > Cheers,
> > David Remsen
> >
> > 
> ----------------------------------------------------------------------
> > ------ David Remsen, Senior Programme Officer Electronic Catalog of 
> > Names of Known Organisms Global Biodiversity Information Facility 
> > Secretariat Universitetsparken 15, DK-2100 Copenhagen, Denmark
> > Tel: +45-35321472   Fax: +45-35321480
> > Mobile +45 27201472
> > Skype: dremsen
> > 
> ----------------------------------------------------------------------
> > ------
> 
> --
> _________________
> Jim Croft ~ jim.croft at gmail.com ~ +61-2-62509499 ~ 
> http://www.google.com/profiles/jim.croft
> ... in pursuit of the meaning of leaf ...
> ... 'All is leaf' ('Alles ist Blatt') - Goethe
> 
> _______________________________________________
> 
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
> 
> The Taxacom archive going back to 1992 may be searched with 
> either of these methods:
> 
> (1) http://taxacom.markmail.org
> 
> Or (2) a Google search specified as:  
> site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here