Spelling detection and correction in Taxonomic Databases
edalcin
edalcin at ONMAIL.COM.BR
Fri May 24 14:10:01 CDT 2002
Dear taxacom,
I'm an MPhil/PhD student at University of Southampton - UK, working with
some computational techniques in order to detect (and maybe correct)
"bad data" in taxonomic databases. These techniques are organized in
three different approaches: structural, contextual and spelling errors.
I'm working with some taxonomic databases where the most important are:
* Species 2000 - 51,918 "unique names"
* ILDIS - 15,616 "unique names"
* Northeast of Brazil Plants Checklist - 7,691 "unique names"
* Atlantic Rain Forest (Brazil) - 1,802 "unique names"
I'm focussing on the spelling errors approach at this moment and I'm
wondering if anyone is working in any similar or related approach in
order to share our experiences.
I would like to know, as well, if any taxacom members that have
Taxonomic Databases would like to have their database checked by the
tools that I'm working on. These tools generate a list of "suspect pairs
of names", that could be spelling errors, using different algorithms.
Here are some examples that arise from the cited dbs:
* Spirodela polyrhiza
Spirodela polyrrhiza
* Inga brachystachya
Inga brachystachys
* Squatina occulta
Squatina oculata
* Steindachneria argentea
Steindachnerina argentea
* Tephrosia clementii
* Tephrosia clementis
* Rhipsalis cassutha
Rhipsalis cassyta
* Epidendrum cinnabarimum
Epidendrum cinnabarinum
* Fleurya aestuans
Fleurya aestyans
Thank you in advance for any comments and contributions to my work.
-------------------
Eduardo Dalcin
edalcin at soton.ac.uk
-------------------
More information about the Taxacom
mailing list