[Taxacom] FW: formation of zoological names with Mc, Mac, et

Stephen Thorpe s.thorpe at auckland.ac.nz
Thu Sep 3 16:21:25 CDT 2009


Hi David,

Parse the salt! :)

>we are interested in finding cases that either break the parsers or are simply too ambiguous to effectively parse

Well, there is a kind of "name instance" that has been used in the past, and can be part of valid nomenclatural acts, like type species designations, but which doesn't actually use 'namestrings' at all! Thomas Broun was a pioneering coleopterist in N.Z. at the end of the 19th and beginning of the 20th centuries. He described many new species of beetle, and assigned a 'species number' to each species recognised by him. Then he would often write things like 'the type [of the new genus] is no. 2134'. I consider these to be valid type species designations, unless invalid for other reasons. I cite it here as just one example of "nonstandard" formatting which could cause problems for automated interpretation of taxonomic literature.

Another issue would be when by some lapsus an author is clearly (by context) referring to one species, but actually calls it by the name of another species! A hypothetical example might be a tiny mite parasite Mitey minutus with a big mammal host Mammothus maximus. The author could say something like 'M. maximus is the smallest mite so far known on a mammal host'! 

Stephen

________________________________________
From: taxacom-bounces at mailman.nhm.ku.edu [taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of David Remsen (GBIF) [dremsen at gbif.org]
Sent: Thursday, 3 September 2009 9:07 p.m.
To: taxacom at mailman.nhm.ku.edu
Subject: Re: [Taxacom] FW: formation of zoological names with Mc, Mac, et

Just a couple of points regarding this thread.

We use the term 'namestring' when referring to the literal orthography
of a name as it has been used in a particular instance because the
same name generally has many distinct orthographies.   A name may or
may not include authorship.  Authorship may or may not include a
publication year.  Authors are abbreviated, etc.    Processing
namestrings is necessary in this digital age and the reality of this
wide latitude in orthography presents special difficulties in
effectively grouping the right sets of namestrings together and
excluding the wrong ones.   This is less an issue when working with
single datasets but becomes significant when integrating data from
many sources.  It's amazing how many different ways a name can be
written and all are essentially correct, just more or less compete.

Regarding name atomisation, we are quite close to being able to
effectively atomise 99.99% of all namestrings into distinct and
identified components so that the use of a parsed or unparsed
scientific name in data management practices will be less an issue
than it might be now.  The development of name parsing tools and
services is quite active at the moment and is currently testing
against rather esoteric orthographies. In fact,  we are interested in
finding cases that either break the parsers or are simply too
ambiguous to effectively parse.

A reference implementation can be tried at http://globalnames.org/parsers/new

Cheers,
David Remsen

----------------------------------------------------------------------------
David Remsen, Senior Programme Officer
Electronic Catalog of Names of Known Organisms
Global Biodiversity Information Facility Secretariat
Universitetsparken 15, DK-2100 Copenhagen, Denmark
Tel: +45-35321472   Fax: +45-35321480
Mobile +45 27201472
Skype: dremsen
----------------------------------------------------------------------------



On Sep 3, 2009, at 10:37 AM, Kevin Richards wrote:

> I believe the new DarwinCore schema has separate fields for
> scientific name components:
>
> scientificName - full scientific name, author etc
> genus
> specificEpithet
> infraspecificEpithet
> scientificNameAuthorship
> etc
>
> same with most TDWG schemas (eg TCS, LSID vocabs)
>
> see http://rs.tdwg.org/dwc/terms/index.htm for details
> also see http://wiki.tdwg.org/twiki/bin/view/DarwinCore/DarwinCoreVersions
>  for a comparison of the darwin core versions.
>
> Kevin
>
>
> ________________________________________
> From: taxacom-bounces at mailman.nhm.ku.edu [taxacom-bounces at mailman.nhm.ku.edu
> ] On Behalf Of Tony.Rees at csiro.au [Tony.Rees at csiro.au]
> Sent: Thursday, 3 September 2009 7:14 p.m.
> To: s.thorpe at auckland.ac.nz; jim.croft at gmail.com
> Cc: taxacom at mailman.nhm.ku.edu
> Subject: Re: [Taxacom] FW: formation of zoological names with Mc,
> Mac, et
>
> Hi Steve,
>
> Well I *thought* Darwin Core had separate fields for scientific name
> ("scientificname") and author, however it appears I am wrong and the
> intention is to hold a "namestring", see http://wiki.tdwg.org/twiki/bin/view/DarwinCore/ScientificName
>
> However the OBIS implementation of Darwin Core, with which I have
> spent most of my experience over the past x years, *does* have
> separate fields for "scientificname" and "scientificnameauthor", see http://www.iobis.org/tech/provider/implementation/
> , which is what I was (perhaps foolishly) expecting in the master
> Darwin Core spec, since it is notionally an extension of DC - anyone
> from TDWG care to comment?
>
> - Tony
>
>
> -----Original Message-----
> From: Stephen Thorpe [mailto:s.thorpe at auckland.ac.nz]
> Sent: Thursday, 3 September 2009 5:06 PM
> To: Rees, Tony (CMAR, Hobart); jim.croft at gmail.com
> Cc: taxacom at mailman.nhm.ku.edu
> Subject: RE: [Taxacom] FW: formation of zoological names with Mc,
> Mac, et
>
> Tony, Jim, list,
>
> I am getting a little confused if you guys are agreeing or
> disagreeing with me!
> I am saying that EFFECTIVELY, the authority/date is part of the
> name, albeit an optional part, despite the Code claiming that it
> isn't, but at the same time treating it as if it is! The Code can
> only prescribe things about names, NOT about auxiliary information.
> The Code prescribes what the authority/date of a name is, so these
> things are part of the name, albeit optional. Most databases/
> publications in the world today would have a single field called
> 'Name', which would look like this:
>
> Name: Examplus primus Smith, 1970
>
> NOT like this:
>
> Name: Examplus primus
> Authority: Smith
> Date: 1970
>
> At least in entomology, when you put a determination label on a
> specimen, you typically include authority date as part of the
> identification (=as part of the name).
>
> OK, so we could play "semantic revisionism" and redefine name as
> "namestring", and say that the Code prescribes things about
> namestrings, but why bother?
>
> One thing to be clear about is that I am certainly NOT in favour of
> only citing e.g. Examplus primus, and leaving out the authority/
> date! What I AM saying is that we should NOT write things like
> Examplus primus A.B. Smith, jr., October 20, 1970. Instead, if we
> really want to know auxiliary info., then we should structure a
> database more like this:
>
> Name: Examplus primus Smith, 1970
> Authority: A.B. Smith, jr. (born: January 1, 1900, in Utopia,
> Lalaland)
> Publication date: October 20, 1970
>
> Cheers,
>
> Stephen
>
> ________________________________________
> From: Tony.Rees at csiro.au [Tony.Rees at csiro.au]
> Sent: Thursday, 3 September 2009 6:38 p.m.
> To: jim.croft at gmail.com; Stephen Thorpe
> Cc: taxacom at mailman.nhm.ku.edu
> Subject: RE: [Taxacom] FW: formation of zoological names with Mc,
> Mac, et
>
> Jim, all,
>
> Well we are just talking semantics here - Stephen says the authority
> is part of the name (which I agree it is not), you say the authority
> as a qualifier to the name is generally redundant (which I disagree
> with). David Remsen and GNA folk call the name and subsequent
> authority (and possibly other qualifier) information the
> "namestring", I term I don't really love but will use in this
> context...
>
> To see why namestrings are more useful than names is not hard, e.g.
> take a look at any official nomenclatural stuff (ICZN in this
> instance), e.g.
>
> http://www.iczn.org/BZNJun2009opinions.html
>
> You will, I hope, see the liberal use of "namestrings" rather than
> just "names". ICZN certainly need to use these, and others to see
> them, for all the reasons recently expounded as a part of this
> thread, for as long as ambiguities or potential persist (which may
> be some time...)
>
> Now, I would never put all of these elements in a single name field
> in my database, since I do not have such a thing, however I
> certainly have genus, species, authority (or author and date in
> separate fields) and can perm them to reconstruct namestrings as
> needed. Actually I would be surprised if you did not do the same??
> This is quite different of course from primary keys, which in my
> usage and many others', has nothing to do with these name elements,
> for reasons previously discussed.
>
> Regards - Tony
>
>
> -----Original Message-----
> From: taxacom-bounces at mailman.nhm.ku.edu [mailto:taxacom-bounces at mailman.nhm.ku.edu
> ] On Behalf Of Jim Croft
> Sent: Thursday, 3 September 2009 4:15 PM
> To: Stephen Thorpe
> Cc: taxacom at mailman.nhm.ku.edu
> Subject: Re: [Taxacom] FW: formation of zoological names with Mc,
> Mac, et
>
> I am Jim Croft (aka James Reginald Croft, aka Kim Croft to his family,
> aka all maner of unpublishable appellations to his staff and
> colleagues).  The one born in Toorak Melbourne Australia on 28 May
> 1951.  The bureaucrat botanist, not the hellfire and brimstone Baptist
> minister, not the peace out hippie bookbinder.
>
> My name is not "Jim Croft (Toorak) 1951" - although it could be argued
> the implied information content is a little bit less ambiguous
>
> When it comes to names, plants and animals are no different... If you
> need to other information to sort out the use of the name, store and
> manage that information.  But don't glue it to the name.  Because you
> will never have enough and the end result will be unusable and
> unenforceable.
>
> I have had numerous arguments about this at the ANBG, saying that the
> use of author names on labels and other materials in the Gardens is a
> waste of time and space.  Other than a pathetically pretentious
> attempt to look scientific it serves no useful purpose.  The only
> people who need to invoke this information are nomenclaturalists when
> they need to sort out which name to apply to which taxon.  Once that
> is done and documented, no-one needs to see it.  Especially not the
> public.
>
> jim
>
>
> On Thu, Sep 3, 2009 at 8:25 AM, Stephen
> Thorpe<s.thorpe at auckland.ac.nz> wrote:
>> Note that the author/date are in the name field (as they are in any
>> sensible taxonomic database), implying that they are part of the
>> name in some meaningful sense, despite an overly pedantic
>> interpretation of the Code denying this! I guess one of the many
>> inconsistencies in the Code is that it says author/date isn't part
>> of the name, but then treats it as part of the name in many
>> contexts...
>
> --
> _________________
> Jim Croft ~ jim.croft at gmail.com ~ +61-2-62509499 ~
> http://www.google.com/profiles/jim.croft
> ... in pursuit of the meaning of leaf ...
> ... 'All is leaf' ('Alles ist Blatt') - Goethe
>
> _______________________________________________
>
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either
> of these methods:
>
> (1) http://taxacom.markmail.org
>
> Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/
> pipermail/taxacom  your search terms here
>
> Please consider the environment before printing this email
> Warning:  This electronic message together with any attachments is
> confidential. If you receive it in error: (i) you must not read,
> use, disclose, copy or retain it; (ii) please contact the sender
> immediately by reply email and then delete the emails.
> The views expressed in this email may not be those of Landcare
> Research New Zealand Limited. http://www.landcareresearch.co.nz
>
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either
> of these methods:
>
> (1) http://taxacom.markmail.org
>
> Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/
> pipermail/taxacom  your search terms here
>


_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here



More information about the Taxacom mailing list