[Taxacom] Dirty data - WAS: i4Life Call for Pilot Projects
Mike Sadka
M.Sadka at nhm.ac.uk
Tue Jul 24 11:18:15 CDT 2012
Fair comment Richard.
I didn't intend to suggest that dirty data are always acceptable. Only that they should not be universally despised, as they are capable of improvement. In an ideal world, one ought to be able to know how much confidence can be placed in any particular dataset, and use it accordingly.
Cheerio, Mike
-----Original Message-----
From: Richard Zander [mailto:Richard.Zander at mobot.org]
Sent: 24 July 2012 15:07
To: Mike Sadka; Stephen Thorpe; TAXACOM
Subject: RE: [Taxacom] Dirty data - WAS: i4Life Call for Pilot Projects
There have been publications that indicate that, say GenBank, has half the data bad such that it is not properly documented. I use books and data banks that I can estimate are a certain percentage dubious. I use the ones that have maybe 5 percent wrong citations. This follows the Bayes' Solution philosophy of some level of acceptable credible interval GIVEN POTENTIAL RISK OF LOSS if wrong. A bibliographic nomenclatural data set like Tropicos can have a low wrong-rate that can be ignored because failure is corrigible and does not affect science, funding, or the environment to a significant degree.
Data sets that are used in biodiversity studies such that triage decisions are made on the basis of these data sets may be no more wrong than a bibliographic nomenclatural data set yet a wrong datum may destroy a species.
Thus there is a practical perspective that modifies how we must judge the usefulness of any particular data set.
____________________________
Richard H. Zander
Missouri Botanical Garden, PO Box 299, St. Louis, MO 63166-0299 USA Web sites: http://www.mobot.org/plantscience/resbot/ and http://www.mobot.org/plantscience/bfna/bfnamenu.htm
Modern Evolutionary Systematics Web site:
http://www.mobot.org/plantscience/resbot/21EvSy.htm
UPS and FedExpr - MBG, 4344 Shaw Blvd, St. Louis 63110 USA
-----Original Message-----
From: taxacom-bounces at mailman.nhm.ku.edu
[mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Mike Sadka
Sent: Tuesday, July 24, 2012 4:58 AM
To: 'Stephen Thorpe'; 'TAXACOM'
Subject: [Taxacom] Dirty data - WAS: i4Life Call for Pilot Projects
Hi Stephen
Dirty data have at least two things going for them:
1/ they are more useful than no data
2/ they have the potential to be cleansed
(snip)
More information about the Taxacom
mailing list