[Taxacom] taxonomic names databases

Thu Sep 1 16:18:57 CDT 2016

Somewhat relevant here is the fact that NZOR (New Zealand Organisms Register) has finally been updated with new data, but some unknown proportion of the new records are complete garbage, like this one for example (I have quite widely publicised it already, so they may take it down or fix it): http://www.nzor.org.nz/names/9b474227-7eca-47c5-a263-a5caa218e811
It lists "Polistes dominula 1791" as a species with status "Synonym of Polistes subg. Polistes"! It is a flawed duplicate of: http://www.nzor.org.nz/names/d98c86ac-4eda-4956-b4c1-959359db5471
but even that record puts the species in the wrong subgenus (and misspells that subgenus!)
This is just one example of many diverse problems that I easily found within about an hour of browsing after the update! There is little motivation to provide feedback to the site, since they are the ones getting paid to do the work, and they won't credit any corrections to whoever points them out, they will just pretend like the mistake never happened, and that's if they bother to fix it at all.
The worry is that all this garbage is going to be harvested by GBIF, etc., and the errors propagated widely! I strongly urge GBIF not to aggregate from NZOR without displaying some sort of warning (perhaps a reliability rating that is suitably low)
Cheers,
Stephen

--------------------------------------------
On Fri, 2/9/16, Nico Franz <nico.franz at asu.edu> wrote:

 Subject: Re: [Taxacom] taxonomic names databases
 To: "TAXACOM" <taxacom at mailman.nhm.ku.edu>
 Received: Friday, 2 September, 2016, 5:52 AM

 Not all of this discussion is
 adequately captured if we do not make some
 qualitative or relative distinction between data quality and
 trust in data.
 These two are clearly related but can nevertheless have
 different pathways
 in our data environments and point to different means for
 resolution.

 My sense is that in the following situation, many of us will
 not have to
 hesitate for long to decide which option is preferable.

 1. A dataset with 99 records that are "good", and 1 that is
 "bad" (needs
 "repair"), and to which I have no direct editing access *in
 the system*
 where that system is designed to give me that access and
 editing power and
 -credit.

 2. A dataset with 80 records that are "good", and 20 that
 are "bad", but
 where the system design is such that I have the right to
 access, repair,
 have that action stored permanently (provenance), and
 accredited to me.

 The first dataset is of better quality, but the design tells
 me that it is
 unfixable by me. Do I feel comfortable publishing on the 100
 records?
 Actually, not really. Is the act of someone with access
 fixing that 1
 record for me a genuine solution? Also not really, because
 "good" (quality)
 is often a function of time, and with time certain aspects
 of good quality
 data are bound to deteriorate, and so the one-time fix does
 not operate at
 the problem's root.

 The second dataset is of worse quality, but in some sense it
 just tells me
 what I already know about my specimen-level science, i.e.
 that if I am lazy
 or not available to oversee the quality, then there might be
 issues. I may
 decide to fix them, or not, depending on the level of
 quality that I need
 for a particular intended set of inferences I wish to make.
 In either case,
 that is my call, and I will get it to the point where I do
 feel comfortable
 publishing. The design of the second system facilitates
 that, and *that* is
 why I trust more, not because it has better data.

 So then, at the surface this may sometimes look like a
 discussion about
 data quality only. It is not. Too many aggregating systems
 are systemically
 mis-designed to (not) empower individual experts while
 preserving a record
 of individual contributions and diversity of views.
 Acceptance of a
 classificatory system, for instance, tends to be a localized
 phenomenon,
 even in a regional community of multiple herbaria, for
 instance. Nobody in
 particular believes in a single backbone. This failure to
 design
 appropriately primarily affects trust, and secondarily
 quality, more so
 over time. A great range of sound biological inferences are
 still possible.
 But so are better designs.

 Cheers, Nico
 _______________________________________________
 Taxacom Mailing List
 Taxacom at mailman.nhm.ku.edu
 http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
 The Taxacom Archive back to 1992 may be searched at: http://taxacom.markmail.org

 Injecting Intellectual Liquidity for 29 years.