data sharing: Errors creep in but not out

Don McAllister mcall at SUPERAJE.COM
Sun Dec 6 12:17:51 CST 1998


Being stringent with data quality and hence audits is important.  Once errors
are published a whole train of records can be generated.  A case in point is the
fish species, the ninespine stickleback, Pungititus pungitius.

Several decades ago an erronious Greenland record was cited for this species.
That the record was in error has been mentioned in the literature two or three
times.  Even emminent ichthyologists like W.B. Scott and E.J. Crossman in their
Freshwater Fishes of Canada map a Greenland occurrence for this species.  Lee et
al's subesequent Atlas of North American Freshwater Fishes specifically state
"not in Greenland" in the hopes of stamping out this record.

Geographic records serve innumerable functions: biogeographic, ecological,
physiological, management, conservation, etc.  Not all these kinds of users are
conversant with the reliability of a label on a specimen - perhaps identified by
a summer student, and will take this as the word of god.  Not all kinds of users
realize that identification is simple.  It may be for a whole adult prepared
skin in good condition of the American robin.  But identification of a species
of shiner genus Notropis or for an eelpout, Lycodes, is more of a challenge,
above all if the specimen is not adult or well preserved.

Furthermore, there just aren't up-to-date revisions for many taxa.  Until a
catalogue of the Gulf of St. Lawrence invertebrates was published this year, the
most recent catalogue for the Atlantic coast of Canada was Whiteaves (1901).  At
least half of those invertebrates have not been revised in the last couple of
decades.  Estimates suggest that half of the organisms in Canada are yet to be
scientifically named and classified.  All this suggests that for more poorly
known groups records first pass through the filter of an expert taxonomist - if
one is still unretired or alive!  Taxonomy and taxonomists are endangered.

The long term answer is providing natural history collections with sufficient
resources so that identifications can be periodically reviewed and updated.  The
data is precious, invaluable - when it is reliable. The data serves all kinds of
functions and the voucher specimens and expert care in the identification and
data basing is well worth society's investment.

The only other approach is that, before publication, that such data should be
subject to expert taxonomic review. That approach is less happy, because if an
expert suspects that an out of range record is misidentified, then the
publication will have to be put on hold until the specimen(s) is(are) found and
checked.  Of course the consequences do not only follow from geographic errors,
they may be seasonal errors - a migratory bird or a fish taken in a spot where
it would not normally be.

The problem of recognizable error is simpler for well-known groups like birds.
AOU checklists, distribution maps, state lists, and many other sources help one
to check against error.  But try and find such resources for sea anemones,
fungi, or aquatic plants, and more likely than not you will not be in luck.

Computer data bases can be used for doing some routine checking.  One check I
used to run periodically on our fish collection was to plot all the records, one
by one, for the Canadian provinces. Any record falling outside its province,
then either had the wrong province or latitude/longitude, or both.  More of
these checks should be built into systems.

All of this says that Systematics 2000, Species 2000 and other agenda to resolve
the taxonomic impediment deserve serious attention and funding.  Until then
treat taxonomic data bases with a grain of salt.  Don't think that all data is
equal.  Don't forget that some smaller museums have a natural history curator
supposedly responsible for all biota - even if the collection is magically
accessible by the internet!

don
Don McAllister




More information about the Taxacom mailing list