[Taxacom] Dirty data - WAS: i4Life Call for Pilot Projects
Richard Zander
Richard.Zander at mobot.org
Tue Jul 24 09:07:07 CDT 2012
There have been publications that indicate that, say GenBank, has half
the data bad such that it is not properly documented. I use books and
data banks that I can estimate are a certain percentage dubious. I use
the ones that have maybe 5 percent wrong citations. This follows the
Bayes' Solution philosophy of some level of acceptable credible interval
GIVEN POTENTIAL RISK OF LOSS if wrong. A bibliographic nomenclatural
data set like Tropicos can have a low wrong-rate that can be ignored
because failure is corrigible and does not affect science, funding, or
the environment to a significant degree.
Data sets that are used in biodiversity studies such that triage
decisions are made on the basis of these data sets may be no more wrong
than a bibliographic nomenclatural data set yet a wrong datum may
destroy a species.
Thus there is a practical perspective that modifies how we must judge
the usefulness of any particular data set.
____________________________
Richard H. Zander
Missouri Botanical Garden, PO Box 299, St. Louis, MO 63166-0299 USA
Web sites: http://www.mobot.org/plantscience/resbot/ and
http://www.mobot.org/plantscience/bfna/bfnamenu.htm
Modern Evolutionary Systematics Web site:
http://www.mobot.org/plantscience/resbot/21EvSy.htm
UPS and FedExpr - MBG, 4344 Shaw Blvd, St. Louis 63110 USA
-----Original Message-----
From: taxacom-bounces at mailman.nhm.ku.edu
[mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Mike Sadka
Sent: Tuesday, July 24, 2012 4:58 AM
To: 'Stephen Thorpe'; 'TAXACOM'
Subject: [Taxacom] Dirty data - WAS: i4Life Call for Pilot Projects
Hi Stephen
Dirty data have at least two things going for them:
1/ they are more useful than no data
2/ they have the potential to be cleansed
(snip)
More information about the Taxacom
mailing list