Spelling detection and correction in Taxonomic Databases

B.J.Tindall bti at DSMZ.DE
Fri May 24 15:41:14 CDT 2002


Sounds like an interesting tool. Unfortunately there are probably quite a
few names in bacteriology which are incorrectly formed and with a
decreasing awareness of Latin or Greek the problems could get worse. If you
can check names in a database would it also be possible to develop
something to check new names or new combinations online before the author
of the name goes to press?
Brian

At 14:10 24.5.2002 +0100, edalcin wrote:
>Dear taxacom,
>
>I'm an MPhil/PhD student at University of Southampton - UK, working with
>some computational techniques in order to detect (and maybe correct)
>"bad data" in taxonomic databases. These techniques are organized in
>three different approaches: structural, contextual  and spelling errors.
>
>I'm working with some taxonomic databases where the most important are:
>
>* Species 2000 - 51,918 "unique names"
>* ILDIS - 15,616 "unique names"
>* Northeast of Brazil Plants Checklist - 7,691 "unique names"
>* Atlantic Rain Forest (Brazil) - 1,802 "unique names"
>
>I'm focussing on the spelling errors approach at this moment and I'm
>wondering if anyone is working in any similar or related approach in
>order to share our experiences.
>
>I would like to know, as well, if any taxacom members that have
>Taxonomic Databases would like to have their database checked by the
>tools that I'm working on. These tools generate a list of "suspect pairs
>of names", that could be spelling errors, using different algorithms.
>
>Here are some examples that arise from the cited dbs:
>
>* Spirodela polyrhiza
>  Spirodela polyrrhiza
>
>* Inga brachystachya
>  Inga brachystachys
>
>* Squatina occulta
>  Squatina oculata
>
>* Steindachneria argentea
>  Steindachnerina argentea
>
>* Tephrosia clementii
>* Tephrosia clementis
>
>* Rhipsalis cassutha
>  Rhipsalis cassyta
>
>* Epidendrum cinnabarimum
>  Epidendrum cinnabarinum
>
>* Fleurya aestuans
>  Fleurya aestyans
>
>
>Thank you in advance for any comments and contributions to my work.
>
>-------------------
>Eduardo Dalcin
>edalcin at soton.ac.uk
>-------------------
>


********************************************************************
* Dr.B.J.Tindall      E-MAIL bti at dsmz.de                           *
* DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH *
* Mascheroder Weg 1b, D-38124 Braunschweig, Germany                *
* Tel.: ++ 531 2616 0 (general)                                    *
* Tel.: ++ 531 2616 224 (direct)                                   *
* Fax:  ++ 531 2616 418                                            *
* Fax:  ++ 531 2616 491 (ISDN)                                     *
*                                                                  *
* Homepage: http://www.dsmz.de/index.html                          *
* E-MAIL: help at dsmz.de (general enquiries)                         *
*         sales at dsmz.de (sales)                                    *
********************************************************************




More information about the Taxacom mailing list