Text Extraction Again (from Taxonomic e-text)
Mike Dallwitz
mike.dallwitz at NETSPEED.COM.AU
Sun Jan 25 00:04:42 CST 2004
- From: "Beach, James H" <beach at KU.EDU>
> Does anyone have information on recent attempts to use text
> extraction software on taxonomic e-texts and databases for the
> purposes of extracting taxonomic names or other taxon attribute data?
Here are some programs that extract taxon attribute data from
natural-language descriptions.
Diederich, J., Fortuner, R. & Milton, J. (1999). Computer-assisted
data extraction from the taxonomical literature.
http://math.ucdavis.edu/~milton/genisys.html.
Gouda, E. J. TAXASOFT DELTA Programs (DDCONV).
http://botu07.bio.uu.nl/taxasoft/
Taylor, A. (1996). Extracting Knowledge from Biological Descriptions.
http://www.cse.unsw.edu.au/~andrewt/papers/nlp_vlkb95/nlp_vlkb95.html
I'm sceptical about such programs, because most conventional
descriptions are so bad (i.e. non-comparative) that even people find it
difficult to extract useful information from them.
When using these programs, keep in mind that a character list is not
just a list of words or phrases that have been used to describe a group
of organisms. Constructing a character list requires taxonomic wisdom
and judgement.
The first test I would apply to a program for creating a descriptive
database from descriptions would be to see whether it can reconstruct a
DELTA database from natural-language descriptions generated from that
database.
--
Mike Dallwitz
13 Warrambool Close, Giralang ACT 2617, Australia
Phone: +61 2 6241 2884
Email: mike.dallwitz at netspeed.com.au Internet: http://delta-intkey.com
More information about the Taxacom
mailing list