Taxacom: Tropicos and gender of names

Donald Hobern dhobern at gbif.org
Wed Feb 9 22:00:07 CST 2022


I'm sure this will just serve to waste more of all of our breath, but I want to highlight a distinction that I feel is being glossed over. I have spent many years processing scientific names for use in databases. And I am familiar with countless projects that started by asserting that building a good names management tool is simple to build and that then got bogged down for years in making it work adequately.

It is not hard - and it is highly desirable - to build a nomenclatural database that meets the needs of the taxonomists working on a particular group. If the database is to capture nomenclatural acts or name usages and support what is effectively row-by-row lookup, the challenges are ("simply") making sure that the data model can represent the range of special conditions considered important. A tool like this, particularly one that links to the original publications, is enormously valuable for taxonomists and for some other user groups. It should by and large converge over time on being an inarguable summary of facts. (Although, in the real world, the edge cases make this ideal hard to achieve.) Such a tool is of great importance to many people on this group and could benefit from large-scale cross-taxon effort, as with ZooBank, IPNI, Index Fungorum, LPSN, and ICTV.

However, a nomenclatural database does not meet the needs of 95% of consumers of biodiversity information and may in fact cause more confusion than it solves. For most other biologists, environmental scientists, conservationists, invasion biologists, field naturalists, molecular researchers, etc., nomenclature is just the archaeological remains of more than 260 years of taxonomic effort. These users have two basic types of information need (which are simply the gateways for them to answer many other more interesting questions): 1) I've found a scientific name somewhere - what species is being referenced? and 2) Find me all the relevant information of some type that relates to what I know as species X.

Binomials (plus authorship (plus page number (plus sensu reference (plus ...)))) are dreadful keys for building data systems that meet these challenges. We have to document original names, known combinations and asserted synonymy, allow for normal fluidity around author spelling/abbreviation and publication years, perhaps strip away presumed Latin gender endings, accommodate a range of bespoke orthography to represent uncertainty, balance the probability of misspellings (including misspellings in the original epithets), etc. It is far from trivial to build a system that can do what a taxonomist familiar with a group does when encountering a novel combination (instantly recognising that the combination represents a transfer to another known genus, or a wholly new genus, or an obscure resurrection of a genus based on presumed priority, or ...).

The use cases and number of users for a taxonomic (rather than nomenclatural) information system make it likely that funding will be easier to attract for such cases. The quality of such systems will be improved if comprehensive nomenclatural datasets are available to underpin them. This is itself an important reason to support initiatives like ZooBank, along with other platforms that bring together a taxonomic community to create a single point-of-truth for the nomenclature of their group. Standardising how all these systems represent the messy parts would also be a big help.

However, any discussion about why databases struggle with biological nomenclature should acknowledge that the problem is not with representing nomenclatural acts. As noted, that can be done. Rather, it's with the fluidity of real-world references to the names that taxonomists publish. That's where names as lookup keys start to fail.

Donald


----------------------------------------------------------------------
Donald Hobern - dhobern at gbif.org<mailto:dhobern at gbif.org>
Global Biodiversity Information Facility https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.gbif.org%2F&data=04%7C01%7Ctaxacom%40lists.ku.edu%7Ca5a76a7fa5f84af34d1008d9ec49d5de%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C637800624296012086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=oDcpszWESgzdwwgmyMG1%2B%2F0x7%2BPeAg4cShs%2FiYnLOUM%3D&reserved=0
----------------------------------------------------------------------

________________________________
From: Taxacom <taxacom-bounces at lists.ku.edu> on behalf of dipteryx--- via Taxacom <taxacom at lists.ku.edu>
Sent: Wednesday, February 9, 2022 8:48 PM
To: <taxacom at lists.ku.edu> <taxacom at lists.ku.edu>
Subject: Re: Taxacom: Tropicos and gender of names

The idea of adapting biological nomenclature so that
it fits in databases reminds me of the early days of
computers when software was written to fit the current
model, and any new model made it necessary to throw
out the previous work and start anew. Or, for that matter,
cutting off sections of famous paintings to make them fit
their newly allotted wall space.

It seems much simpler to wait until databasers grow up
and can handle concepts beyond those aimed to fit in the
grasp of a 3-year old?

Paul

> Op 08-02-2022 17:36 schreef Scott Thomson via Taxacom <taxacom at lists.ku.edu>:
> [...]
> However, taking off my biologist or linguist hat for a moment. As a
> computer programmer having designed databases, mostly in SQL I think there
> are a lot of valid reasons to be rid of gender agreement and just use
> original spelling. Mostly these come down to the accuracy of pick up by
> databases of these issues. It is one aspect that could be avoided making
> all databases far more accurate and with simpler rules. It should be
> remembered that as I was taught when I did software engineering, a computer
> program is a recipe designed for a 3 year old, the computer may be faster
> than us, but do not equate that to more intelligent. Famous movie quote, it
> just runs programs. The computer cannot make any decision it is not told to
> make. Therefore if we want high speed and excruciatingly accurate data
> reading by these databases, then we should be making it easier for
> databases to read and process data, not harder. [...]
_______________________________________________
Taxacom Mailing List

Send Taxacom mailing list submissions to: taxacom at lists.ku.edu
For list information; to subscribe or unsubscribe, visit: https://lists.ku.edu/listinfo/taxacom
You can reach the person managing the list at: taxacom-owner at lists.ku.edu
The Taxacom email archive back to 1992 can be searched at: https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftaxacom.markmail.org%2F&data=04%7C01%7Ctaxacom%40lists.ku.edu%7Ca5a76a7fa5f84af34d1008d9ec49d5de%7C3c176536afe643f5b96636feabbe3c1a%7C0%7C0%7C637800624296012086%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=PgHwLunGJ8VRUiIlStOTllYrZ%2FmhBqjXXiRded24fsc%3D&reserved=0

Nurturing nuance while assailing ambiguity for about 35 years, 1987-2022.


More information about the Taxacom mailing list