[Taxacom] Data entry estimates
Doug Yanega
dyanega at ucr.edu
Mon Oct 4 18:45:12 CDT 2010
From first and second-hand experience, time required for *pure* data
entry of specimen records (whether from a catalogue or from actual
specimen labels) seems to vary from 200-400 records per day, allowing
for numerous variables including the efficiency of the data entry
person.
The reason I emphasize *pure* data entry is because most databasing
efforts these days involve georeferencing, which means researching
latitudes/longitudes (at a bare minimum), and then it's no longer
pure data entry. For the typical specimen record, which might give "X
mi SW of Y" along with country and state, the process of properly
georeferencing means adding several fields which are not on the
labels themselves: county (or an equivalent tertiary political
division), latitude and longitude (preferrably as fully decimal
degrees to 4 decimal places, otherwise as DMS, but *not* a hybrid and
never UTM or PLSS), precision (given as a radius around the preceding
point, in meters), and elevation (in meters).
For well-known regions (e.g., most of the continental US), Google
Earth will allow you to confidently and accurately fill out these
additional fields, at a cost of around 10-15 minutes per locality
once the data entry person is really comfortable with the interface -
but that time commitment can drag out considerably as the specimen
data range into less well-known regions, farther back in time, or
become skimpier on details and more ambiguous (e.g., "Santa Ana,
Mexico"), or if one has records in UTM or PLSS and has to convert
them into lat/long.
Automated georeferencing tools do exist, but they do not give things
like county or elevation, do not know how to measure distances along
roads, and they also require that a human being checks the output of
each locality lookup directly - which, in practice, takes almost
exactly as long as having a human being doing the entire process. The
bottom line is that you can expect a minimum of 10-15 minutes per
locality, even if you try to automate.
Therefore, if every record is from a different and novel locality,
that can slow things down to a rate of 50 records per day or less.
Hopefully, multiple specimen records will exist for each locality, so
the rate per record will not be that badly impacted (e.g., in UCR's
insect collection database, we have roughly 218,000 specimens with
locality records - from 5,600 different localities, or roughly 40
specimens per locality).
Peace,
--
Doug Yanega Dept. of Entomology Entomology Research Museum
Univ. of California, Riverside, CA 92521-0314 skype: dyanega
phone: (951) 827-4315 (standard disclaimer: opinions are mine, not UCR's)
http://cache.ucr.edu/~heraty/yanega.html
"There are some enterprises in which a careful disorderliness
is the true method" - Herman Melville, Moby Dick, Chap. 82
More information about the Taxacom
mailing list