[Taxacom] Data entry estimates

Doug Yanega dyanega at ucr.edu
Mon Oct 4 18:45:12 CDT 2010


 From first and second-hand experience, time required for *pure* data 
entry of specimen records (whether from a catalogue or from actual 
specimen labels) seems to vary from 200-400 records per day, allowing 
for numerous variables including the efficiency of the data entry 
person.

The reason I emphasize *pure* data entry is because most databasing 
efforts these days involve georeferencing, which means researching 
latitudes/longitudes (at a bare minimum), and then it's no longer 
pure data entry. For the typical specimen record, which might give "X 
mi SW of Y" along with country and state, the process of properly 
georeferencing means adding several fields which are not on the 
labels themselves: county (or an equivalent tertiary political 
division), latitude and longitude (preferrably as fully decimal 
degrees to 4 decimal places, otherwise as DMS, but *not* a hybrid and 
never UTM or PLSS), precision (given as a radius around the preceding 
point, in meters), and elevation (in meters).

For well-known regions (e.g., most of the continental US), Google 
Earth will allow you to confidently and accurately fill out these 
additional fields, at a cost of around 10-15 minutes per locality 
once the data entry person is really comfortable with the interface - 
but that time commitment can drag out considerably as the specimen 
data range into less well-known regions, farther back in time, or 
become skimpier on details and more ambiguous (e.g., "Santa Ana, 
Mexico"), or if one has records in UTM or PLSS and has to convert 
them into lat/long.

Automated georeferencing tools do exist, but they do not give things 
like county or elevation, do not know how to measure distances along 
roads, and they also require that a human being checks the output of 
each locality lookup directly - which, in practice, takes almost 
exactly as long as having a human being doing the entire process. The 
bottom line is that you can expect a minimum of 10-15 minutes per 
locality, even if you try to automate.

Therefore, if every record is from a different and novel locality, 
that can slow things down to a rate of 50 records per day or less. 
Hopefully, multiple specimen records will exist for each locality, so 
the rate per record will not be that badly impacted (e.g., in UCR's 
insect collection database, we have roughly 218,000 specimens with 
locality records - from 5,600 different localities, or roughly 40 
specimens per locality).

Peace,
-- 

Doug Yanega        Dept. of Entomology         Entomology Research Museum
Univ. of California, Riverside, CA 92521-0314        skype: dyanega
phone: (951) 827-4315 (standard disclaimer: opinions are mine, not UCR's)
              http://cache.ucr.edu/~heraty/yanega.html
   "There are some enterprises in which a careful disorderliness
         is the true method" - Herman Melville, Moby Dick, Chap. 82




More information about the Taxacom mailing list