High Throughput Virtual Herbarium

Thu Nov 6 10:58:33 CST 2003

Greetings all,

At conferences I have been getting a lot of questions recently about what high throughput data acquisition is for Virtual Herbaria.  Well it is anything that is much faster than having someone sit in the herbarium and type the label data in by hand. It isn't at all complicated in concept. This is an example.

http://www.virtualherbarium.org/vh/convolvs/

In my tradition of making pre-beta prototypes public, here is a little example of stage one of the Virtual Herbarium data acquisition stream.  Stage one - generates this full text database.  Human time investment per sheet is essentially only whatever the scan time is.  Later, and from anywhere with an internet connection, humans parse the label data into our Virtual Herbarium IV database. At that point you have a fielded database that looks like any other herbarium database except that every specimen is imaged. In the mean time you still have this version available for less sophisticated searching.

For this part, the only software that isn't off the shelf is the little Perl script that Dan McGillicuddy wrote to put the image names into the database. The database is MSAccess, the OCR software is Omnipage, the images were scanned on a $1000 scanner and batch compressed using MrSid. If you used different image compression, the total cost for the scanner and software could be under $1200.  

This is an 8000 sheet loan that we sent to Dan Austin this year.  It is all of the Convolvulaceae from the FAU herbarium which is now at FTG.

As I said, the whole system isn't ready for distribution but you definitely can try this at home on your own.     
http://www.virtualherbarium.org/vh/convolvs/

Cheers,

Stinger Guala  Ph.D.
Keeper of Herbaria
Fairchild Tropical Garden
Miami, FL 33156