[Taxacom] taxonomic names databases
Stephen Thorpe
stephen_thorpe at yahoo.co.nz
Thu Sep 1 16:18:57 CDT 2016
Somewhat relevant here is the fact that NZOR (New Zealand Organisms Register) has finally been updated with new data, but some unknown proportion of the new records are complete garbage, like this one for example (I have quite widely publicised it already, so they may take it down or fix it): http://www.nzor.org.nz/names/9b474227-7eca-47c5-a263-a5caa218e811
It lists "Polistes dominula 1791" as a species with status "Synonym of Polistes subg. Polistes"! It is a flawed duplicate of: http://www.nzor.org.nz/names/d98c86ac-4eda-4956-b4c1-959359db5471
but even that record puts the species in the wrong subgenus (and misspells that subgenus!)
This is just one example of many diverse problems that I easily found within about an hour of browsing after the update! There is little motivation to provide feedback to the site, since they are the ones getting paid to do the work, and they won't credit any corrections to whoever points them out, they will just pretend like the mistake never happened, and that's if they bother to fix it at all.
The worry is that all this garbage is going to be harvested by GBIF, etc., and the errors propagated widely! I strongly urge GBIF not to aggregate from NZOR without displaying some sort of warning (perhaps a reliability rating that is suitably low)
Cheers,
Stephen
--------------------------------------------
On Fri, 2/9/16, Nico Franz <nico.franz at asu.edu> wrote:
Subject: Re: [Taxacom] taxonomic names databases
To: "TAXACOM" <taxacom at mailman.nhm.ku.edu>
Received: Friday, 2 September, 2016, 5:52 AM
Not all of this discussion is
adequately captured if we do not make some
qualitative or relative distinction between data quality and
trust in data.
These two are clearly related but can nevertheless have
different pathways
in our data environments and point to different means for
resolution.
My sense is that in the following situation, many of us will
not have to
hesitate for long to decide which option is preferable.
1. A dataset with 99 records that are "good", and 1 that is
"bad" (needs
"repair"), and to which I have no direct editing access *in
the system*
where that system is designed to give me that access and
editing power and
-credit.
2. A dataset with 80 records that are "good", and 20 that
are "bad", but
where the system design is such that I have the right to
access, repair,
have that action stored permanently (provenance), and
accredited to me.
The first dataset is of better quality, but the design tells
me that it is
unfixable by me. Do I feel comfortable publishing on the 100
records?
Actually, not really. Is the act of someone with access
fixing that 1
record for me a genuine solution? Also not really, because
"good" (quality)
is often a function of time, and with time certain aspects
of good quality
data are bound to deteriorate, and so the one-time fix does
not operate at
the problem's root.
The second dataset is of worse quality, but in some sense it
just tells me
what I already know about my specimen-level science, i.e.
that if I am lazy
or not available to oversee the quality, then there might be
issues. I may
decide to fix them, or not, depending on the level of
quality that I need
for a particular intended set of inferences I wish to make.
In either case,
that is my call, and I will get it to the point where I do
feel comfortable
publishing. The design of the second system facilitates
that, and *that* is
why I trust more, not because it has better data.
So then, at the surface this may sometimes look like a
discussion about
data quality only. It is not. Too many aggregating systems
are systemically
mis-designed to (not) empower individual experts while
preserving a record
of individual contributions and diversity of views.
Acceptance of a
classificatory system, for instance, tends to be a localized
phenomenon,
even in a regional community of multiple herbaria, for
instance. Nobody in
particular believes in a single backbone. This failure to
design
appropriately primarily affects trust, and secondarily
quality, more so
over time. A great range of sound biological inferences are
still possible.
But so are better designs.
Cheers, Nico
_______________________________________________
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
The Taxacom Archive back to 1992 may be searched at: http://taxacom.markmail.org
Injecting Intellectual Liquidity for 29 years.
More information about the Taxacom
mailing list