[Taxacom] FW: formation of zoological names with Mc, Mac, et

Jim Croft jim.croft at gmail.com
Sat Sep 5 12:18:55 CDT 2009


A quick point of clarification...  These days many (most?) databases
enable/require atomized entry of name data and name metadata.

Is the design intent of the 'namestring' thingy to: a) act as a
placeholder for an unparsed string until someone can get around to
doing something wioth it; b) act as a repository for a concatenation
of atomized elements in anticipation of some mysterious operational
wizardry; or c) both?

and, if b), what elements could or should be concatenated?

just wondering is all....

jim

On Thu, Sep 3, 2009 at 7:07 PM, David Remsen (GBIF)<dremsen at gbif.org> wrote:
>
> Just a couple of points regarding this thread.
>
> We use the term 'namestring' when referring to the literal orthography
> of a name as it has been used in a particular instance because the
> same name generally has many distinct orthographies.   A name may or
> may not include authorship.  Authorship may or may not include a
> publication year.  Authors are abbreviated, etc.    Processing
> namestrings is necessary in this digital age and the reality of this
> wide latitude in orthography presents special difficulties in
> effectively grouping the right sets of namestrings together and
> excluding the wrong ones.   This is less an issue when working with
> single datasets but becomes significant when integrating data from
> many sources.  It's amazing how many different ways a name can be
> written and all are essentially correct, just more or less compete.
>
> Regarding name atomisation, we are quite close to being able to
> effectively atomise 99.99% of all namestrings into distinct and
> identified components so that the use of a parsed or unparsed
> scientific name in data management practices will be less an issue
> than it might be now.  The development of name parsing tools and
> services is quite active at the moment and is currently testing
> against rather esoteric orthographies. In fact,  we are interested in
> finding cases that either break the parsers or are simply too
> ambiguous to effectively parse.
>
> A reference implementation can be tried at http://globalnames.org/parsers/new
>
> Cheers,
> David Remsen
>
> ----------------------------------------------------------------------------
> David Remsen, Senior Programme Officer
> Electronic Catalog of Names of Known Organisms
> Global Biodiversity Information Facility Secretariat
> Universitetsparken 15, DK-2100 Copenhagen, Denmark
> Tel: +45-35321472   Fax: +45-35321480
> Mobile +45 27201472
> Skype: dremsen
> ----------------------------------------------------------------------------

-- 
_________________
Jim Croft ~ jim.croft at gmail.com ~ +61-2-62509499 ~
http://www.google.com/profiles/jim.croft
... in pursuit of the meaning of leaf ...
... 'All is leaf' ('Alles ist Blatt') - Goethe




More information about the Taxacom mailing list