[Taxacom] Why stability? - Revisited

Fri May 1 12:09:38 CDT 2015

Hi Nico,

Just to play devils advocate, as much as Avibase is an impressive achievement (I’m playing with some data from it right now), at the end of the day it’s basically munging together checklists. There’s no evidence base that we can access, we are essentially combining opinions on what species or subspecies go where. Some of these checklists are literally just lists of names, representing somebody’s - no doubt considered - opinion, whereas I’d really like to see why someone thinks two taxa are synonyms, or one species should be split into two, etc. What is the, you know, actual evidence?

I believe that, if an individual produces a monograph that has well defined reference boundaries - a domain of reference, so to speak (this perceived taxon, at this time, in that region, given this nomenclatural and taxonomic legacy, these sets of specimens, traits, inferred trees, etc.) - and that monograph gets aggregated into a larger biodiversity information environment, then in that environment the identity of the monographic content should remain "relevantly recognizable". The aggregator environment does in effect expand the monograph's original domain of reference in ways that the monograph's author cannot readily or reliably predict.

…

This will sound a bit dramatic, but many aggregator systems are currently structurally designed in a way that the graduate student, postdoc, or more senior scientist producing a monograph is inadvertently disenfranchised when their taxonomic language contribution migrates from the traditional to the integrative publication environment.

I find the notion that monographs are monolithic entities with boundaries to be respected to be a little last century ;) I would like traceability of evidence, but this doesn’t require a monograph as such. We could have single, citable assertions (say, equivalent to a single paper that shows what was thought to be a new species was actually simply the male of a known species), or we could have a set of assertions, each individually identifiable but all clustered as coming from the same monograph. In other words, nano publications, which may be aggregated into larger sets if desired. I suspect this is the way a lot of data curation subjects, such as taxonomy, are going to be heading in.

As always there seems to be a tension between doing things the way we always have, albeit using new technology, or using new technology to rethink they way we do things. I don’t mean it as pejoratively as that sounds - new isn’t always necessarily better, but I think we are missing opportunities to rethink the way we do things.

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:  Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>
Tel:  +44 141 330 4778
Skype:  rdmpage
Facebook:  http://www.facebook.com/rdmpage
LinkedIn:  http://uk.linkedin.com/in/rdmpage
Twitter:  http://twitter.com/rdmpage
Blog:  http://iphylo.blogspot.com
ORCID:  http://orcid.org/0000-0002-7101-9767
Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
ResearchGate https://www.researchgate.net/profile/Roderic_Page

On 1 May 2015, at 16:22, Nico Franz <nico.franz at asu.edu<mailto:nico.franz at asu.edu>> wrote:

Thanks, Rod (and Tony).

   Also for steering things back a bit.

   I believe that, if an individual produces a monograph that has well defined reference boundaries - a domain of reference, so to speak (this perceived taxon, at this time, in that region, given this nomenclatural and taxonomic legacy, these sets of specimens, traits, inferred trees, etc.) - and that monograph gets aggregated into a larger biodiversity information environment, then in that environment the identity of the monographic content should remain "relevantly recognizable". The aggregator environment does in effect expand the monograph's original domain of reference in ways that the monograph's author cannot readily or reliably predict.

   To me his puts the onus on the aggregator environment to provide technical design solutions that are capable of supporting the communication and social recognition models that human taxonomy making and revising relies on.

   Where do taxonomic concepts fit in here? We have, at this point, some individual efforts (two absolute stand-outs to me are Lepage's Avibase [http://zookeys.pensoft.net/articles.php?id=3906] and Weakley's Flora [http://www.herbarium.unc.edu/flora.htm]) that demonstrate at considerable scales (thousands of currently recognized species concepts, > one century taxonomy legacy depth, tens of thousands to millions of articulations) that taxonomic concept individuation and integration based on semantics that complement nomenclatural relationships is feasible. Avibase in particular implements a database to sustain these reference services.

   I think a fair and contemporary assessment is, as we move to greater, more integrative scales, there will be issues that we have not fully grasped yet, and other issues that we can already identify and which will be hard. For instance, I understand that Avibase uses taxonomic names at the family level and above, while shifting to taxonomic concept resolution at lower levels. But we also do have a small but growing body of theory and practice that shows feasibility and value, to my mind. Worthy of praise perhaps, and further exploration.

   The following is in my view a persistent challenge to the aggregators. When we initially build these larger biodiversity data repositories with successively more encompassing taxonomies whose intellectual authorship origins are diverse, and then curate the taxonomies in the new environments as we go along, we are in some sense generating new systematic theories intended to reflect reference standards for a wide range of contributors and users.

http://link.springer.com/article/10.1007%2Fs13752-012-0049-z

   But who owns the new theories, or identifiable parts of them? Who can express their assessments of their validity, or perceived need for correction or expansion? This will sound a bit dramatic, but many aggregator systems are currently structurally designed in a way that the graduate student, postdoc, or more senior scientist producing a monograph is inadvertently disenfranchised when their taxonomic language contribution migrates from the traditional to the integrative publication environment.

   So, yes, we do not have it all figured out. Maybe it won't work in the end for very many important applications. We are also not alone in this.

http://link.springer.com/chapter/10.1007%2F978-1-84628-901-9_8

Cheers, Nico

On Fri, May 1, 2015 at 3:31 AM, Roderic Page <Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>> wrote:
Hi Nico,

To return to you’re original post and question, a couple of quick comments.

As Stephen Thorpe alluded to, once aspect of instability is IMHO a function of the burden taxonomic names carry. We would like:

1. human readable, globally unique names, that

2. also tell us something about relationships (e.g. the genus name matters), and

3. carry some link to provenance (e.g., taxonomic authority, author for new combinations, etc.)

There’s pretty much no way to satisfy these requirements without tradeoffs of one sort or another. For example, for reasons that I’ve now forgotten I thought it would be fun to try and track down the original species descriptions associated with a recent paper on the declining rate of descriptions of new bird species ( http://dx.doi.org/10.1093/sysbio/syu069, see also http://eol.org/collections/116394 ). Cue much heartache as many of these names have changed, and often discovering the original name (and publication) is a world of hurt as people shuffle species between genera and up and down between species and subspecies rank (e.g., http://bionames.org/names/cluster/642623 ).

We have a naming system that is hugely unstable because goals 1 and 2 are incompatible (at least, they are in the absence of any system to track name changes, botanists do this quite well, zoologists don’t).

Regarding your bigger point about your “extreme” system, I think this is kind of where we are heading, especially when you think of things like DNA barcoding. However, I suspect that what people will focus on is not the long history of shuffling specimens between names and taxa, but what the latest snap shot is "right now". Databases that make this explicit (GBIF - taxa as sets of occurrences, NCBI and BOLD - taxa as sets of sequences) will be useful and underpin actual research. Databases that make this implicit (i.e., most taxonomic databases) will be a lot less useful.

I love the taxonomic legacy as much as anyone, indeed I spend most of my time trying to expose it as much as possible (hence http://biostor.org<http://biostor.org/> and http://bionames.org<http://bionames.org/> ), but I suspect a lot of discussion about the relationship between concepts will be of perhaps limited relevance except in some (possibly spectacular) edges cases.

Regards

Rod

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
Institute of Biodiversity, Animal Health and Comparative Medicine
College of Medical, Veterinary and Life Sciences
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email:  Roderic.Page at glasgow.ac.uk<mailto:Roderic.Page at glasgow.ac.uk>
Tel:  +44 141 330 4778<tel:%2B44%20141%20330%204778>
Skype:  rdmpage
Facebook:  http://www.facebook.com/rdmpage
LinkedIn:  http://uk.linkedin.com/in/rdmpage
Twitter:  http://twitter.com/rdmpage
Blog:  http://iphylo.blogspot.com<http://iphylo.blogspot.com/>
ORCID:  http://orcid.org/0000-0002-7101-9767
Citations:  http://scholar.google.co.uk/citations?hl=en&user=4Z5WABAAAAAJ
ResearchGate https://www.researchgate.net/profile/Roderic_Page