[Taxacom] taxonomic names databases

Thu Sep 1 18:38:25 CDT 2016

I suggest that a crucial issue here is whether or not it is a good idea for these databases to be based on ANYBODY's expertise! They ought simply, I suggest, to be tracking the primary literature using defined protocols which are constant across all taxa. They ought, I suggest, to be designed to be verifiable against the primary literature, not simply "taken on trust". As soon as you involve an active taxonomist, the database becomes a potential platform for them to favour, promote and protect their own taxonomic opinions outside of a peer reviewed context. What we want are experts at tracking and making sense of primary taxonomic literature, whatever groups are involved.

Stephen

--------------------------------------------
On Fri, 2/9/16, Tony Rees <tonyrees49 at gmail.com> wrote:

 Subject: Re: [Taxacom] taxonomic names databases
 To: "Nico Franz" <nico.franz at asu.edu>, "taxacom" <taxacom at mailman.nhm.ku.edu>
 Received: Friday, 2 September, 2016, 11:22 AM

 Hi Nico, all,

 I have to take issue with
 Nico's main point here which seems to be that a
 database with a higher level of residual
 errors, that can be corrected by
 "anybody", may be preferable to one
 with a lower level but is under the
 control
 of a "gatekeeper" so to speak, who has sole
 editing rights. In my
 experience, at least
 for the major systems with a track record of
 scientific scrutiny and continuous effort to
 improve,  the latter tend to
 be much more
 reliable than the former: for example why would I not defer
 to
 Bill Eschmeyer's expertise for
 information on extant fishes, Paul Kirk's on
 the fungi, Geoff Read for Annelida, and so on?
 If I find errors or
 inconsistencies in their
 system's content I simply alert them and, 9 times
 out of ten, receive a prompt and courteous
 reply and relevant action as
 well as
 appreciation for spotting the error. I do not want carte
 blanche to
 edit their systems and they would
 probably not appreciate it either!

 In any event, no such system is ever perfect
 and one would be wise to
 separately verify
 any data item considered "crucial" to a planned
 publication etc. All databases have disclaimers
 about potential residual
 errors, one simply
 has to make a judgement about which are more or less
 trustworthy of fit for a particular intended
 use, and where to set the bar
 beneath which
 it is simply better to ignore a particular data system as
 a
 source of sufficiently trusted
 information. In reality most "aggregators"
 of such data take the best sources they can (a
 subjective decision) and
 then hopefully,
 either have a proactive policy of detecting inherited
 errors - such as inter-dataset comparisons and
 investigation of
 discrepancies as revealed,
 going back to the original literature, and
 numerous internal data integrity checks, or at
 least be reactive to
 improvements as
 suggested by others. At least that is what I aspire to -
 and recognise that it will never be perfect,
 but hopefully still a lot
 better than no
 equivalent product (hence "Interim" as the first
 word in the
 name of my project, IRMNG).

 Just my 2 cents, as ever,

 Best - Tony

 Tony Rees, New South Wales,
 Australia
 https://about.me/TonyRees

 On 2 September 2016 at 03:52,
 Nico Franz <nico.franz at asu.edu>
 wrote:

 > Not all of this
 discussion is adequately captured if we do not make some
 > qualitative or relative distinction
 between data quality and trust in data.
 >
 These two are clearly related but can nevertheless have
 different pathways
 > in our data
 environments and point to different means for resolution.
 >
 > My sense is that in
 the following situation, many of us will not have to
 > hesitate for long to decide which option
 is preferable.
 >
 > 1.
 A dataset with 99 records that are "good", and 1
 that is "bad" (needs
 >
 "repair"), and to which I have no direct editing
 access *in the system*
 > where that
 system is designed to give me that access and editing power
 and
 > -credit.
 >
 > 2. A dataset with 80 records that are
 "good", and 20 that are "bad", but
 > where the system design is such that I
 have the right to access, repair,
 > have
 that action stored permanently (provenance), and accredited
 to me.
 >
 > The first
 dataset is of better quality, but the design tells me that
 it is
 > unfixable by me. Do I feel
 comfortable publishing on the 100 records?
 > Actually, not really. Is the act of
 someone with access fixing that 1
 >
 record for me a genuine solution? Also not really, because
 "good" (quality)
 > is often a
 function of time, and with time certain aspects of good
 quality
 > data are bound to deteriorate,
 and so the one-time fix does not operate at
 > the problem's root.
 >
 > The second dataset is
 of worse quality, but in some sense it just tells me
 > what I already know about my
 specimen-level science, i.e. that if I am lazy
 > or not available to oversee the quality,
 then there might be issues. I may
 >
 decide to fix them, or not, depending on the level of
 quality that I need
 > for a particular
 intended set of inferences I wish to make. In either
 case,
 > that is my call, and I will get
 it to the point where I do feel comfortable
 > publishing. The design of the second
 system facilitates that, and *that* is
 >
 why I trust more, not because it has better data.
 >
 > So then, at the
 surface this may sometimes look like a discussion about
 > data quality only. It is not. Too many
 aggregating systems are systemically
 >
 mis-designed to (not) empower individual experts while
 preserving a record
 > of individual
 contributions and diversity of views. Acceptance of a
 > classificatory system, for instance, tends
 to be a localized phenomenon,
 > even in a
 regional community of multiple herbaria, for instance.
 Nobody in
 > particular believes in a
 single backbone. This failure to design
 >
 appropriately primarily affects trust, and secondarily
 quality, more so
 > over time. A great
 range of sound biological inferences are still possible.
 > But so are better designs.
 >
 > Cheers, Nico
 >
 _______________________________________________
 > Taxacom Mailing List
 >
 Taxacom at mailman.nhm.ku.edu
 > http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
 > The Taxacom Archive back to 1992 may be
 searched at:
 > http://taxacom.markmail.org
 >
 > Injecting
 Intellectual Liquidity for 29 years.
 >
 _______________________________________________
 Taxacom Mailing List
 Taxacom at mailman.nhm.ku.edu
 http://mailman.nhm.ku.edu/cgi-bin/mailman/listinfo/taxacom
 The Taxacom Archive back to 1992 may be
 searched at: http://taxacom.markmail.org

 Injecting Intellectual
 Liquidity for 29 years.