Spelling detection and correction in Taxonomic Databases
B.J.Tindall
bti at DSMZ.DE
Fri May 24 15:41:14 CDT 2002
Sounds like an interesting tool. Unfortunately there are probably quite a
few names in bacteriology which are incorrectly formed and with a
decreasing awareness of Latin or Greek the problems could get worse. If you
can check names in a database would it also be possible to develop
something to check new names or new combinations online before the author
of the name goes to press?
Brian
At 14:10 24.5.2002 +0100, edalcin wrote:
>Dear taxacom,
>
>I'm an MPhil/PhD student at University of Southampton - UK, working with
>some computational techniques in order to detect (and maybe correct)
>"bad data" in taxonomic databases. These techniques are organized in
>three different approaches: structural, contextual and spelling errors.
>
>I'm working with some taxonomic databases where the most important are:
>
>* Species 2000 - 51,918 "unique names"
>* ILDIS - 15,616 "unique names"
>* Northeast of Brazil Plants Checklist - 7,691 "unique names"
>* Atlantic Rain Forest (Brazil) - 1,802 "unique names"
>
>I'm focussing on the spelling errors approach at this moment and I'm
>wondering if anyone is working in any similar or related approach in
>order to share our experiences.
>
>I would like to know, as well, if any taxacom members that have
>Taxonomic Databases would like to have their database checked by the
>tools that I'm working on. These tools generate a list of "suspect pairs
>of names", that could be spelling errors, using different algorithms.
>
>Here are some examples that arise from the cited dbs:
>
>* Spirodela polyrhiza
> Spirodela polyrrhiza
>
>* Inga brachystachya
> Inga brachystachys
>
>* Squatina occulta
> Squatina oculata
>
>* Steindachneria argentea
> Steindachnerina argentea
>
>* Tephrosia clementii
>* Tephrosia clementis
>
>* Rhipsalis cassutha
> Rhipsalis cassyta
>
>* Epidendrum cinnabarimum
> Epidendrum cinnabarinum
>
>* Fleurya aestuans
> Fleurya aestyans
>
>
>Thank you in advance for any comments and contributions to my work.
>
>-------------------
>Eduardo Dalcin
>edalcin at soton.ac.uk
>-------------------
>
********************************************************************
* Dr.B.J.Tindall E-MAIL bti at dsmz.de *
* DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH *
* Mascheroder Weg 1b, D-38124 Braunschweig, Germany *
* Tel.: ++ 531 2616 0 (general) *
* Tel.: ++ 531 2616 224 (direct) *
* Fax: ++ 531 2616 418 *
* Fax: ++ 531 2616 491 (ISDN) *
* *
* Homepage: http://www.dsmz.de/index.html *
* E-MAIL: help at dsmz.de (general enquiries) *
* sales at dsmz.de (sales) *
********************************************************************
More information about the Taxacom
mailing list